REPORT  DOCUMENTATION  PAGE 


rorm  Approvea 
0MB  No,  0704-0788 


Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  Instructions,  searchino  existina  data 

needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  othe?  aspect  of  t^s 
collection  of  burden,  to  Washington  Headquarters  Services.  Directorate  for  information  Operations  and  RepoAs  1215^jS?erlon 

Davis  Highway.  Suite  1204.  Arlington,  VA  22202-4302.  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project  (0704-0 188),  Washington  DC  20503  ^ 


1.  AGENCY  USE  ONLY  (Leave  blank) 


3.  REPORT  TYPE  AND  DATES  COVERED 


4.  TITLE  AND  SUBTITLE  ^ .//  ^  ^ 


6.  AUTHOR(S 


7.  PERFORMING  ORGANIZATION  NAME{S)  AND  ADDRESS(ES) 

AFIT  Students  Attending: 


9.  SPONSORING /MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 
DEPARTMENT  OF  THE  AIR  FORCE 
AFIT/CI 

2950  P  STREET,  BLDG  125 
WRIGHT-PATTERSON  AFB  OH  45433-7765 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSORING /MONITORING 
AGENCY  REPORT  NUMBER 


11.  SUPPLEMENTARY  NOTES 


12a.  DISTRIBUTION /AVAILABILITY  STATEMENT 

Approved  for  Public  Release  lAW  AFR  190-1 
Distribution  Unlimited 
BRIAN  D.  Gauthier,  MSgt,  USAF 
Chief  Administration 


12b.  DISTRIBUTION  CODE 


13.  ABSTRACT  (Maximum  200  words) 


19951031  107 


. A.'JJ  a  1.-:, i' B 


16.  PRICE  CODE 


I  ’®'  SECURITY  CLASSIFICATION  j  19.  SECURITY  CLASSIFICATION  |  20.  LIMITATION  OF  ABSTRACT  1 
OF  REPORT  OF  THIS  PAGE  OF  ABSTRACT 


MSN  7540-01-280-5500 


Standard  Form  298  (Rev.  2-89) 


A  Comparison  of  the  Performance  of  Non-Parametric 
Classifiers  with  Gaussian  Maximum  Likelihood  for  the 
Classification  of  Multispectral  Remotely  Sensed  Data 


by: 

Steven  W.  Nessmiller 


Captain,  USAF 

Bachelor  of  Science  -  Engineering  Sciences,  Control  Theory 
United  States  Air  Force  Academy 


A  thesis  submitted  in  partial  fulfillment  of  the  requirements  for  the  degree 
of  Master  of  Science  in  the  Center  for  Imaging  in  the  College  of  Imaging  Arts 
and  Sciences  of  the  Rochester  Institute  of  Technology 


Dr.  John  Schott,  Thesis  Advisor 
Dr.  Peter  Anderson,  Committee  Member 
Dr.  Roger  Easton,  Committee  Member 
Dr.  Harvey  Rhody,  Committee  Member 


Aoceaslon  Tor 

HTIS  QRAkl 
DTIC  TAB 

Unarmoumced 

Justification. 


By - — 

Distribution/ _ 

Availability  Codas 
[Avail  and/QP 
Special 


□□ 


A  COMPARISON  OF  THE  PERFORMANCE  OF  NON-PARAMETRIC 
CLASSIFIERS  WITH  GAUSSIAN  MAXIMUM  LIKELIHOOD  FOR  THE 
CLASSIFICATION  OF  MULTISPECTRAL  REMOTELY  SENSED  DATA 


by 

Steven  W.  Nessmiller 
B.S.  United  States  Air  Force  Academy 
(1988) 


A  thesis  submitted  in  partial  fulfillment  of  the 
requirements  for  the  degree  of  Master  of  Science 
in  the  Center  for  Imaging  Science 
Rochester  Institute  of  Technology 

September  1 995 


I 


Center  for  Imaging  Science 
Rochester  Institute  of  Technology 
Rochester,  New  York 


Master  of  Science  Degree  Thesis 


The  Master  of  Science  degree  thesis  of  Steven  W.  Nessmiller 
has  been  examined  and  approved  by  the  thesis 
committee  as  satisfactory  for  the  thesis  requirement 
for  the  Master  of  Science  Degree 


5r.  John  Schott,  Thesis  Advisor 


Dr.^^er  Easton,  Committee  Member 


Dr.  Harvey  Rhody,  Committee  Member 


Center  for  Imaging  Science 
Rochester  Institute  of  Technology 
Rochester,  New  York 


Thesis  Release  Permission  Form 


Thesis  Title: 

A  Comparison  of  the  Performance  of  Non-Parametric  Classifiers 
with  Gaussian  Maximum  Likelihood  for  the  Classification 
of  Remotely  Sensed  Multispectral  Data 


I,  Steven  W.  Nessmiller,  grant  permission  to  the  Wallace  Memorial  Library  of  the 
Rochester  Institute  of  Technology  to  reproduce  this  thesis  in  whole  or  in  part 
provided  any  reproduction  will  not  be  of  commercial  use  or  for  profit. 


Steven  W.  Nessmiller,  Captain  USAF 


/J?  ^ 

Date 
iii 


r 


A  Comparison  of  the  Performance  of  Non-Parametric  Classifiers 
with  Gaussian  Maximum  Likelihood  for  the  Classification 
of  Remotely  Sensed  Multispectral  Data 

by: 

Steven  W.  Nessmiller 


1.0  Abstract 

This  study  compares  the  performance  of  two  non-parametric  classifiers  and 
Gaussian  Maximum  Likelihood  (GML)  for  the  classification  of  LANDSAT  TM  30-meter 
resolution  six-band  data.  The  mathematical  assumptions  made  in  developing  GML  are 
valid  if  the  pixels  that  constitute  the  training  classes  are  normally  distributed.  Since  it 
requires  a  model  of  the  data,  GML  is  termed  a  "parametric"  classifier.  Of  current  interest 
are  new  classification  methodologies  that  make  no  assumptions  about  the  statistical 
distribution  of  the  pixels  in  the  training  class;  these  approaches  are  termed 
"non-parametric"  classifiers.  This  study  will  compare  the  n-Dimensional  Probability 
Density  Function  (nPDF)  essentially  a  projection  technique  that  reduces  data 
dimensionality,  and  an  advanced  neural  network  that  utilizes  fuzzy-set  mathematics,  the 
Fuzzy  ARTMAP,  to  the  traditional  GML  approach  to  image  classification.  The  different 
approaches  will  be  compared  for  statistical  classification  accuracy  and  computational 
efficiency. 
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2.0  Introduction 

The  objective  of  this  thesis  is  to  compare  the  performance  of  two  non-parametric 
classifiers  (a  fuzzy  ARTMAP  neural  network  and  the  nPDF  algorithm)  with  the  classical 
Gaussian  maximum  likelihood  (GML)  approach.  All  parametric  classification  schemes, 
including  GML,  make  some  assumption  about  the  statistical  distribution  of  the 
training-class  pixel  intensity  vectors.  This  assumption  is  utilized  to  determine  a  statistical 
decision  rule  (i.e.  the  Mahalanobis  distance)  for  classification  purposes.  The  GML  has 
been  shown  to  be  a  robust  classifier,  but  its  effectiveness  suffers  when  the  training-class 
pixel  distribution  varies  markedly  from  normality  or  when  class  means  are  only  slightly 
separated  (Frey,  1994).  Non-parametric  classifiers  make  no  assumption  of  the 
distribution  of  the  pixels  in  the  training  classes.  As  such,  they  exhibit  increased  accuracy, 
but  are  extremely  sensitive  to  biased  training  sets. 

Four  multispectral  images  of  varying  composition  (i.e.  agricultural,  rural,  urban, 
and  forest)  will  be  classified  with  the  same  training  data  by  each  algorithm.  Four  images 
will  be  classified  by  each  algorithm  to  minimize  the  potential  of  anomalous  effects 
arising  from  a  particular  image  and  algorithm  combination. 

The  LANDSAT  satellite  collects  multispectral  information  as  it  orbits  above  the 
earth's  surface.  The  images  collected  by  this  system  are  composed  of  pixels  that 
nominally  represent  the  irradiance  gathered  from  a  30-meter-square  patch  of  the  earth's 
surface.  Each  pixel  in  an  image  can  be  described  by  a  six-dimensional  intensity  vector 
whose  elements  are  the  8-bit  digital  count  (DC)  value  (an  integer  ranging  from  0  to  255). 
The  spectral  sensitivity  of  each  band  is  summarized  in  table  2.0.1  (Richards,  1993).  Note 
that  band  6,  the  thermal  band,  is  not  included.  This  is  due  to  the  fact  that  this  information 
is  not  correlated  with  that  in  the  other  bands.  As  such,  it  typically  is  not  utilized  in  image 
classification  operations. 
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Table  2.0.1  -  Spectral  sensitivity  of  LANDSAT  TM  bands 


TM  Band 

Bandpass  (pm) 

"Color" 

1 

0.45  -  0.52 

blue 

2 

0.52  -  0.60 

green 

3 

0.63  -  0.69 

red 

4 

0.76  -  0.90 

near  IR 

5 

1.55-1.75 

mid  IR 

7 

2.08  -  2.35 

mid  IR 

Similarly,  the  M-7  airborne  system  is  another  multispectral  sensor.  It  utilizes  a 
line  scanning  system  and  can  measure  light  at  wavelengths  in  the  range  of  0.33  to  14.0 
pm.  Up  to  19  different  bands  in  this  range  can  be  collected  simultaneously,  with  a  typical 
ground  sample  distance  of  approximately  5  meters.  Table  2.0.2  details  the  selected  bands 
utilized  in  this  study  to  mimic  the  LANDSAT  TM  sensor. 

Table  2.0.2  -  Spectral  sensitivity  of  selected  M-7  Bands 


M-7  Band 

Bandpass  (pm) 

"Color" 

3 

0.44  -  0.46 

blue 

6 

0.52  -  0.55 

green 

8 

0.60  -  0.67 

red 

10 

0.83-1.00 

near  IR 

12 

1.50-1.90 

mid  IR 

13 

2.10-2.60 

mid  IR 

The  importance  of  high-quality  training  data  cannot  be  overstated,  especially  for  the 
non-parametric  classifiers.  By  using  the  same  training  data  for  each  classification 
algorithm,  the  potentially  negative  effects  of  any  variation  will  be  eliminated.  The 

2 


distance  separating  the  cluster  centers  of  the  target  classes  will  be  decreased  to  evaluate 
the  impact  of  decreased  separation  on  the  effectiveness  of  each  algorithm.  Additionally, 
a  hybrid  classification  approach  which  combines  the  strengths  of  the  different  algorithms 
will  also  be  studied. 

The  different  approaches  to  classification  were  compared  statistically  in  terms  of 
their  classification  accuracy  via  a  confusion  matrix  and  the  kappa  coefficient. 
Computational  efficiency  was  compared  in  terms  of  elapsed  run  time,  training  time,  and 
system  resource  requirements.  This  comparison  highlights  the  relative  strengths  and 
weaknesses  of  the  different  classifiers  and  determine  the  set  of  eonditions  where  a 
specific  classifier  is  best  employed. 
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3.0  Objectives  and  Deliverables 


Statement  of  Work 

♦  Implement  the  various  classifiers  in  the  Advanced  Visualization  System  (AVS),  an 
interactive  data  visualization  environment. 

♦  Select  at  least  four  multispectral  images  of  varying  composition. 

♦  Utilize  the  fuzzy  K-means  algorithm  to  collect  trusted  and  spectrally  pure  training 
data  for  the  classification  algorithms. 

♦  Utilize  an  AVS  module  to  collect  user-defined  training  data  for  the  classification 
algorithms. 

♦  Evaluate  the  classification  algorithms  statistically  in  terms  of  their  classification 
accuracy  on  both  dependent  and  independent  training  sets. 

♦  Evaluate  the  classification  algorithms  computationally  in  terms  of  their  efficiency. 

♦  Experiment  with  hybrid  classification  methodologies. 

List  of  Deliverables: 

♦  A  Gaussian  maximum  likelihood  classification  module  for  AVS  environment. 

♦  An  AVS  module  to  perform  fuzzy  K-means  clustering  to  create  truth  images  and 
collect  trusted  training  data. 

♦  An  nPDF  classification  and  module  for  LANDS  AT  TM  images  for  use  in  the  AVS 
environment  that  will  permit  user  definable  classification  boundaries. 

♦  An  AVS  module  that  implements  the  fuzzy  ARTMAP  neural  network  algorithm  to 
classify  LANDS  AT  TM  images. 

♦  An  AVS  module  to  compute  confusion  matrices  and  classification  accuracy 
statistics. 

♦  A  written  document  covering  the  theory,  background,  approach,  £ind  results  of  the 
study. 
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4.0  Background  and  Approach 


4.1  Acquisition  of  Training  Data 

The  importance  of  high-quality  training  data  in  the  parametric  and  non-parametric 
classification  algorithms  has  been  introduced.  The  various  classification  algorithms 
require  labeled  training  data  to  calculate  representative  statistics,  to  train  a  neural 
network,  or  to  define  classification  boundaries.  The  method  in  which  these  data  will  be 
acquired  from  each  image  bears  some  explanation.  Consider  the  simplified 
representation  of  a  LANDS  AT  TM  scene  (Figure  4. 1 . 1)  in  the  following  discussion.  We 

will  assume  that  there  are  four  classes  of  interest: 
coniferous  trees,  deciduous  trees,  water,  and  grass. 

The  Gaussian  maximum  likelihood  (GML)  parametric 
classifier  assumes  that  the  training-class  pixels  are 
distributed  in  a  multivariate  normal  maimer.  This 
assumption  can  be  validated  through  application  of  the 
central  limit  theorem,  but  this  theorem  also  imposes  a 
restriction  on  the  minimum  number  of  pixels  that  must 
be  present  in  each  training  class.  Swain  and  Davis 
(1978)  state  that  the  minimum  number  of  pixels  per 
training  class  is  lOD,  where  D  is  the  number  of  bands  or  dimensions  being  utilized  in  the 
classification  algorithm,  while  lOOD  would  be  "highly  desirable".  Other  references  state 
that  30  examples  per  class  produce  results  that  are  "quite  good"  for  a  one-dimensional  or 
single-band  case  (Dougherty,  1990).  The  increase  in  the  number  of  training-class  pixels 
required  to  describe  a  class  in  a  feature  space  of  higher  dimensionality  can  be  intuitively 
explained  quite  readily.  As  dimensionality  increases,  the  probability  of  a  particular 
spectral  band  being  inadequately  represented  also  increases.  To  offset  this,  training  class 
size  must  be  positively  correlated  with  dimensionality. 


Figure  4.1.1  -  Depiction  of  a 
LANDSAT  TM  scene 
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Because  non-parametric  classifiers  make  no  assumption  of  the  underlying 
statistical  distribution  of  the  pixels,  no  mathematical  inference  of  class  membership  may 
be  made.  Because  of  this  limitation,  non-parametric  algorithms  require  even  more  robust 
training  data  than  the  parametric  classifiers.  With  these  considerations  in  mind,  this  study 
will  limit  training  class  membership  to  not  less  than  30D  and  will  strive  for  lOOD 
whenever  possible.  In  addition,  all  algorithms  will  use  the  same  training  data  to  classify 
each  image,  thereby  eliminating  any  potential  effects  due  to  variations  in  training  data. 

Typically,  training  sets  are  defined  by  drawing  polygons  on  the  image  that 
delineate  the  extent  of  a  target  class.  Figure  4. 1 .2,  is  identical  to  Figure  4.1.1  except  for 
of  the  superimposed  training  class  polygons.  Class  1  represents  deciduous  trees,  class  2 
is  composed  of  coniferous  trees,  class  3  is  comprised  of  water,  and  class  4  is  made  up  of 
grass  pixels.  Note  that  in  this  example  there  were  not  enough  contiguous  grass  pixels  in 
one  region  to  adequately  describe  the  grass  class.  Because  of  this,  two  regions  had  to  be 
defined  to  meet  the  previously  discussed  minimum  membership  criteria.  The  algorithm 
then  extracts  the  pixels  within  the  polygon  and  "labels"  them  as  belonging  to  the 
indicated  target  class.  The  classification  algorithms  can  then  be  presented  with  a  set  of 


Figure  4.1.2  -  Depiction  of 
LANDS  AT  TM  scene  with 
overlayed  training  class  polygons 
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labeled  training  data  for  each  target  class  so  that  statistical  or  other  calculations  can  be 
made. 

The  preceding  discussion  describes  the  most  common  method  to  collect  training 
data  for  supervised  classification  algorithms.  The  term  "supervised"  implies  that  a  human 
operator  specifies  the  class  membership  of  the  test  pixels.  One  of  the  main  shortcomings 
of  this  process  is  that  it  is  extremely  difficult,  if  not  impossible,  to  collect  sets  of  "pure" 
training  data.  This  difficulty  springs  from  the  inability  to  collect  and  label  pixels  that 
belong  to  only  one  spectral  class.  To  envision  this  problem,  consider  the  areas  of  the 
LAND  SAT  TM  image  depicted  in  figure  4.1.1  that  are  primarily  composed  of  deciduous 
and  coniferous  trees.  It  would  be  extremely  difficult  to  draw  polygons  that  encompassed 
just  tree  pixels  without  accidentally  including  some  of  the  background  grass  pixels. 
Training  sets  will  likely  always  be  "polluted"  with  such  impurities. 

Unsupervised  image  classification  techniques  often  rely  on  information  gathered 
from  determining  centers  of  clusters  belonging  to  naturally  occurring  classes  of  spectral 
vectors  in  the  image.  To  continue  with  the  simplistic  LANDSAT  image  example,  it 
would  be  reasonable  to  assume  that  there  are  4  major  clusters  of  spectral  vectors,  one  for 
each  of  the  broad  classes.  A  clustering  algorithm  typically  determines  the  location  of  the 
cluster  centers  in  pixel  space  in  an  iterative  manner.  Once  the  location  of  the  clusters 
have  been  determined,  some  type  of  minimum-distance-to-the-means  classification  can 
be  readily  accomplished.  This  simple  classification  algorithm  determines  the  Euclidean 
distance  of  the  intensity  vector  of  a  pixel  to  the  various  cluster  centers,  and  then  assigns 
the  pixel  to  the  closest  class.  This  set  of  spectrally  pure  and  labeled  pixels  could  provide 
an  excellent  data  set  to  train  the  supervised  classification  algorithms. 

The  ISODATA,  or  K-means  algorithm  is  a  well  known  method  of  determining 
cluster  centers.  The  parameter  "K"  represents  the  number  of  cluster  centers  to  locate. 

The  algorithm  was  developed  by  G.  H.  Ball  and  D.  J.  Hall  in  1967  and  is  implemented  in 
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many  multispectral  image  processing  packages.  In  1973,  J.  C.  Dunn  developed  a  version 
of  the  algorithm  which  employed  fuzzy  set  theory  to  determine  the  degree  of  membership 
of  a  spectral  vector  to  a  given  cluster.  This  version  generally  converges  faster  and  is  less 
likely  to  divide  naturally  occurring  clusters,  than  the  crisp  set  theory  implementation. 

The  simplest  explanation  for  the  desirable  qualities  of  this  algorithm  is  that  individiral 
pixels  need  not  be  modeled  as  belonging  to  a  cluster,  but  the  degree  of  their  membership 
can  be  determined.  Pixels  with  high  membership  values  greatly  influence  the  cluster 
center,  while  outlying  pixels  with  lower  membership  values  influence  it  to  a  lesser 
degree.  The  mathematical  development  presented  here  essentially  follows  that  in  Dunn 
(1973). 

Fuzzy  set  theory,  to  be  described  in  greater  detail  in  the  section  4.4  of  this  report 
detailing  the  fuzzy  ARTMAP  neural  network,  is  often  "injected"  into  existing  algorithms 
by  implementing  a  membership  function.  This  function  returns  a  value  which  represents 
the  degree  to  which  a  given  data  point  belongs  to  a  set.  Large  membership  values, 
typically  thresholded  to  unity,  indicate  that  the  pixel  intensity  vector  displays  many  of  the 
qualities  of  the  set  to  a  great  degree.  Lower  values,  typically  closer  to  zero,  indicate  that 
the  particular  example  does  not  fiilly  represent  all  qualities  of  the  set.  Membership  values 
are  analogous  to  probabilities,  but  have  different  underlying  mathematical  properties  and 
cannot  be  treated  in  the  same  manner.  The  great  strength  that  this  type  of  fuzzy 
measurement  brings  to  an  algorithm  is  that  data  elements  can  contribute  to  multiple  sets 
rather  than  to  the  membership  of  only  one  set.  Consider  the  representation  of  the 
boundary  between  neighboring  sets.  In  one  case  the  sets  have  crisp  boundaries;  in  the 
other,  the  boundaries  are  fuzzy  (Figure  4.1.3). 


8 


1.5 

1 

crisp  1{  row) 

crisp2(  row) 

- 1 - 

- ^ - 

i 

1.5 

1 

fiizzy  I(  row) 

fuzzy2(row) 

- i  ^ 

/" 

0.5 

■  A 

_ 1 _ 1 

OQ  - 

0.5 

-  A  A  B 

/  N 

/  ' 

1  . 1  ^  _  1 

0 

row 


0 

row 


Figure  4.1.3  -  Depiction  of  the  boundary  between  two  crisp  sets  (A  and  B  at  left) 
and  the  boundary  defined  by  the  membership  function  between  two  ftizzy  sets  (right). 


The  vertical  axis  of  each  plot  represents  the  membership  that  a  given  point  has  in  set  A  or 
set  B.  For  the  crisp  set  case,  this  value  can  be  either  0  or  1 ,  while  the  points  belonging  to 
the  fuzzy  set  can  have  membership  values  anywhere  in  the  unit  interval.  To  see  the 
power  of  a  fuzzy  set  representation,  consider  the  pixel  located  at  x  =  +2.  In  the  crisp  set 
representation,  the  only  information  we  have  concerning  this  data  point  is  that  it  has 
membership  in  set  B.  No  knowledge  about  the  strength  of  its  membership,  or  the  fact 
that  it  is  located  near  a  boundary,  is  conveyed.  In  the  fuzzy-set  view  of  the  same  data 
point  (Figure  4.1.4),  we  see  that  this  data  point  has  strong  membership  (0.9)  in  set  B,  but 

that  it  also  has  some  of  the  qualities 
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(0. 1)  of  the  data  points  that  constitute 
set  A.  The  information  conveyed 
0,9  through  the  membership  function 

allows  us  to  treat  this  data  point  more 
appropriately.  Whenever  a  decision  is 
made  to  assign  a  data  point  to  a 
particular  set,  some  information  is 


Figure  4.1.4-  Depiction  of  the  membership  Fuzzy-set  theory  combats  this 

of  a  point  in  two  neighboring  fuzzy  sets 
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information  loss  by  relaying  a  confidence  measure  about  the  quality  or  strength  of  the 
decision. 

At  the  start  of  the  fuzzy  clustering  algorithm,  the  number  K  of  clusters  to  be  found 
must  be  estimated.  This  can  be  approximated  by  visually  inspecting  the  image,  or  set 
based  on  a  priori  knowledge  of  how  many  classes  are  necessary.  Many  non-trivial  papers 
are  devoted  to  intelligent  determination  of  the  number  of  clusters  present  in  a  data  set,  but 
further  discussion  of  this  subject  is  beyond  the  scope  of  this  report.  After  a  value  for  K  is 
selected,  that  K  pixels  are  chosen  at  random  from  the  input  image.  This  ensures  that  the 
algorithm  is  starting  with  valid  solution  positions  and  is  not  outside  the  solution  space. 
Another  way  to  begin  the  algorithm  would  be  to  determine  the  minimum  and  maximum 
spectral  intensity  vectors  across  all  bands,  and  then  construct  a  line  in  the  N-dimensional 
space  (where  N  represents  the  number  of  data  bands)  with  K  evenly  spaced  points  as 
initial  solutions.  The  membership  with  respect  to  each  of  the  K  clusters  is  then 
determined.  Consider  the  following  membership  equation: 

for  I  <i<k  (41.1) 

J 


where  x  is  the  N-dimensional  spectral  vector  for  a  given  pixel,  u,  is  the  spectral  vector 
representing  the  center  of  the  "/th"  cluster,  and  m/(x)  is  the  membership  of  the  pixel  with 
respect  to  the  /th  cluster.  The  distance  measure  employed  in  this  implementation  should 
be  recognized  as  the  square  of  the  simple  Euclidean  distance.  Note  that  the  membership 
equation  approaches  unity  for  points  near  to  one  class  and  far  from  all  others.  Niunerical 
problems  associated  with  being  very  close  (or  directly  on)  a  cluster  center  are  handled  by 
checking  for  very  small  distance  measures  and  then  setting  the  membership  of  the  pixel  to 
that  cluster  to  one.  Once  the  membership  of  the  pixels  in  the  image  have  been 
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determined  with  respect  to  each  cluster,  the  new  cluster  centers  can  be  calculated. 


Consider  the  following  equation: 


U;  = 


for  1  <  z  <  A: 


(4.1.2) 


where  x  G  %  implies  that  the  sum  is  taken  over  all  the  pixels  in  the  image.  Note  that  this 
equation  must  be  calculated  K  times  to  find  the  elements  of  u, .  This  forms  a  weighted 

average  of  the  influence  of  each  pixel  on  each  cluster  center  through  the  use  of  the 
membership  function.  Once  the  new  cluster  centers  have  been  calculated,  the  distance 
that  the  center  vectors  have  moved  since  the  last  iteration  is  determined.  If  the  maximum 


movement  is  below  some  user-defined  threshold,  we  can  conclude  that  the  cluster  centers 


have  converged  to  their  final  locations.  After  the  cluster  centers  have  converged,  the 
"cluster  map"  image  is  constructed  by  checking  the  membership  value  of  each  pixel  in 
the  image  with  respect  to  all  of  the  clusters.  If  the  largest  membership  value  is  equal  to  or 
greater  than  a  user-defined  membership  threshold,  the  pixel  is  "labeled"  as  part  of  that 
class.  A  set  of  training  data  can  then  be  built  by  scanning  the  cluster  map  image  for 
pixels  assigned  to  clusters,  retrieving  the  corresponding  multispectral  pixel  from  the 
original  image,  sorting  them  by  cluster  number,  and  writing  the  resulting  ordered  set  to 
disk.  In  pseudocode,  the  entire  fuzzy  K-means  algorithm  can  be  expressed  as: 


pick  K  clusters  at  random  from  the  image 
while  clusters  have  not  converged 
calculate  membership 
calculate  new  cluster  centers 
check  for  convergence 
build  cluster  map 
build  training  set 

It  is  important  to  note  that  while  we  have  constructed  a  set  of  labeled  spectrally 
pure  pixels  for  a  data  set,  that  we  have  potentially  "colored"  them  by  the  collection 
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methodology.  We  have  employed  a  Euclidean  distance  measure  that  we  will  later  see  is 
unable  to  account  for  a  data  set's  inherent  distribution.  Also  note  that  the  algorithm  may 
not  find  clusters  in  the  image  of  interest  to  the  analyst,  and,  furthermore,  it  may  then  be 
difficult  for  the  analyst  to  assign  meaningful  labels  to  each  of  the  derived  clusters. 
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Image  Classification  Algorithms 


4.2  Gaussian  Maximum  Likelihood 


Gaussian  Maximum  Likelihood  (GML)  is  perhaps  the  most  popular  classification 
algorithm  due  to  its  employment  of  classical  mathematics  and  the  fact  that  it  generates  a 
measurement  of  membership  certainty  of  a  pixel  to  a  class.  GML  is  widely  taught  in 
introductory  courses  in  both  remote  sensing  and  pattern  recognition,  and  it  is 
implemented  in  many  image  processing  packages.  As  such,  it  is  a  standard  to  which  the 
performance  of  other  algorithms  can  be  compared. 

GML  is  a  parametric  classifier,  based  on  the  assumption  of  normally  distributed 
pixel  intensity  vectors.  It  has  been  shown  that  the  assumption  of  normally  distributed 
pixel  intensity  vectors  is  valid  in  a  typical  remote  sensing  application  (Frey,  1994).  This 
can  be  attributed  to  the  averaging  effect  of  the  LANDSAT  TM  sensor  as  it  detects  the 
irradiance  emitted  or  reflected  from  a  nominally  30-meter  square  patch  on  the  earth. 
Recall  that  the  density  of  the  sum  of  two  independent  random  variables  is  the  convolution 
of  their  individual  densities.  The  averaging  of  the  TM  sensor  essentially  convolves  the 
probability  density  fimctions  of  the  materials  from  which  the  detected  photons  were 
emitted  or  reflected.  Repeated  convolution  of  the  individual  probability  density 
functions  rapidly  approaches  a  normal  distribution.  This  is  the  basis  for  the  central  limit 
theorem.  Nevertheless,  the  GML  algorithm  has  also  been  shown  to  be  extremely  robust, 
even  when  dealing  with  data  sets  that  deviate  markedly  from  normality  (Frey,  1994).  The 
development  presented  here  essentially  follows  that  in  Richards  (1993),  with  the  link  to 
theX^  (chi  squared)  distribution  developed  by  Johnson  and  Wichem  (1992). 
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4.2  Mathematical  Development  of  GML 

In  the  development  of  the  algorithm  we  will  assume  that  the  components  of  the 
n-dimensional  vector  x  represent  the  intensity  vector  of  a  pixel,  and  the  m-dimensional 
vector  w  represents  the  target  classification  classes. 


X\ 

1  1 

WI 

X2 

W2 

w  = 

• 

\x„  J 

From  the  training  data,  we  can  calculate  the  following  conditional  probability, 

p(x|w,)  (4.2.1) 

which  should  be  interpreted  as  the  probability  that  a  pixel  with  spectral  vector  x,  is  a 
member  of  class  w,.  A  much  more  useful  measure  to  develop  would  be; 

p(Wi\x)  (4.2.2) 

which  is  the  probability  of  membership  in  class  w,  for  a  specific  pixel.  This  measure 
could  be  utilized  for  classification  by  determining  the  most  probable  class  for  a  pixel. 

This  measure  can  be  derived  through  the  application  of  Bayes'  Rule  which  can  be  stated 


as: 


p{ar\b) 


(4.2.3) 

(4.2.4) 


pia\b)= 

similarly,  p{b\a)  - 

where  the  n  operator  indicates  simple  intersection  of  two  sets.  By  setting  the  equivalent 
terms  equal  and  regrouping,  we  can  state: 

pm  =  5^ 

In  our  previous  notation,  a  useful  result  is  obtained: 


p{w,)p{\\wi) 

p(x) 


(4.2.6) 
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Two  terms  in  equation  4.2.6  require  some  clarification.  The  term  p{wi)  often  is  titled  an 
a  prior  probability  as  it  acts  as  a  weighting  ftmction  which  reflects  how  much  of  a  given 
image  is  composed  of  a  particular  class.  This  term  normally  is  set  equal  to  1/M ,  where 
M  is  the  number  of  classes.  When  this  condition  is  obviously  not  true,  the  term  will  be 
reasonably  estimated.  The  p(i)  term  represents  the  probability  of  a  pixel  with  spectral 
vector  X  being  present  in  the  image.  Note  that  pixels  are  classified  by  comparing  the 
various  conditional  probabilities  of  each  spectral  intensity  vector  for  each  target 
classification  class.  When  utilizing  this  methodology,  the  term  p(x)  will  appear  in  the 
denominator  of  both  conditional  probabilities  on  both  sides  of  the  inequality.  Since  it  is  a 
common  term,  it  can  be  canceled.  This  results  in  the  following  classification  logic: 

X  €  Wi  if /i(w,|x)  > p(w/|x)  for  all  i  (4.2.7) 

It  is  important  to  realize  that  equation  4.2.7  states  that  all  spectral  vectors  present  in  the 
image  will  be  assigned  to  one  of  the  target  classes  in  the  image.  This  may  not  be 
desirable,  and  the  requirement  can  be  eliminated  by  setting  a  threshold  for  class 
membership  assigning  a  pixel  whose  probability  is  less  than  the  threshold  to  a 
"backgroimd"  class. 

If  we  now  assume  that  the  spectral  intensity  vectors  are  distributed  in  a 
multivariate  normal  manner  the  conditional  probability  can  be  stated  in  the  following 
manner: 

p(\\w,)  =  —y - g-iM-)'-  (4  2,8) 

(2»)=  J|S,| 

Where  is  the  inverse  of  the  variance-covariance  matrix  for  class  i,  p.,-  is  the  mean 
vector  for  the  "/th"  class,  x  is  the  spectral  intensity  vector  whose  class  membership  is 
being  evaluated,  and  D  is  the  number  of  dimensions  or  spectral  bands  being  used.  The 

operator  "  |  | "  designates  matrix  determinant  and  the  transpose  operator  "T"  converts  the 
column  vectors  to  row  vectors  and  vice  versa.  The  variance-covariance  matrix  contains  a 

wealth  of  information  concerning  the  dispersion  of  data  in  a  target  class.  As  such,  it 
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warrants  some  explanation.  The  general  form  of  a  variance-covariance  matrix  is: 


2  2 

a  11  a  ,2 
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Figure  4.2.1  -  general  form  of  E  matrix 

where  is  the  covariance  of  band  i  with  respect  to  band  j.  The  matrix  can  be  calculated 
by: 

E/  =e{(x-p/)^-p,/)^}  (4.2.9) 

Where  s  is  the  expectation  operator.  Typically  an  unbiased  estimator  is  utilized  to 

approximate  the  variance-covariance  matrix  numerically: 

2/  =^E^i(i-p,)(x-p,y  (4.2.10) 

where  K  represents  the  number  of  data  elements  in  the  training  class.  It  is  important  to 

realize  that  this  calculation  must  be  performed  once  for  each  of  m  classes,  so  that  all  the 
target  classification  classes  are  described  mathematically.  Also  realize  that  must 
equal  and  that  is  the  simple  empirical  variance.  In  general,  if  any  of  the 
off-diagonal  terms  (cr^y)  have  large  amplitudes,  bands  i  and  j  are  highly  correlated.  This 
implies  that  information  about  one  band  can  be  used  to  predict  the  value  of  another,  or 
that  the  information  contained  within  the  bands  is  not  independent,  but  correlated.  More 
simply  put,  if  bands  i  and  j  are  positively  correlated,  an  increase  in  band  i  will  be 
accompanied  by  a  corresponding  increase  in  the  observed  values  in  band  j.  If  the 
off-diagonal  values  are  small,  then  the  data  are  independent  and  cannot  be  described  by 
some  linear  combination  of  the  spectral  bands.  As  previously  mentioned,  the  information 
contained  within  the  variance-covariance  matrix  permits  more  accurate  modeling  of  the 
dispersion  of  the  data  that  constitutes  a  training  class.  Specifically,  the  information 


contained  within  a  variance-covariance  matrix  defines  an  n-dimensional  hyperellipsoid  in 
feature  spaee.  The  eigenvalues  of  the  matrix  define  the  lengths  of  the  axes  of  the 
hyperellipsoid,  while  the  associated  eigenvectors  determine  their  orientation  (Johnson  and 
Wichern,  1992). 

Returning  to  the  development  of  GML,  we  will  at  this  point  introduce  the 
discriminant  function,  which  will  result  in  some  mathematical  convenience: 

g/(x)  =  ln{p(x|wO}  (4.2.11) 

and  applying  this  operation  to  the  multivariate  normal  distribution  yields: 

g,(x)  =  -f  ln(27r)-  i  hi  IE, I  -  l(x  -  il,)'E7‘(5  -p,)  (4.2.12) 

Note  that  a  number  of  common  terms  can  be  eliminated  in  classification,  resulting  in  a 

discriminant  function  that  can  be  stated  as: 

g,(x)  =  -In  IS, I  -  (x  -  pi)^2:7‘(x  -  jl/)  (4.2.13) 

Classification  may  be  accomplished  by  employing  the  following  rule: 

X  G  Wi  if  g,(x)  <  gj(\)  for  all  i  j  (4.2.14) 

Note  that  the  last  term  of  equation  4.2.13  resemble  a  distance  measure.  By  eliminating 
the  negative  signs  and  considering  the  special  case  where  S,  is  the  identity  matrix,  the 
classification  rule  reduces  to  the  simple  Euclidean  distance  measure.  This  distance 
measure  can  be  described  mathematically  as: 

cf£(x,iii)^  =  (x-p/)^(x-Pi)  (4.2.15) 

and  the  classifieation  rule  becomes: 

X  e  Wi  if  <ii:(x,  ji,)^  <  P/)^  for  all  i  j  (4.2.16) 

In  feature  space,  equation  4.2.16  defines  a  set  of  hyperspheres  located  concentrically 
about  the  mean  vector  of  the  class.  This  Euclidean  distance  measure  is  the  heart  of  the 
minimum-distance-to-the-means  classification  methodology. 

Incorporation  of  the  data  distribution  information  in  the  variance-covariance 
matrix  into  a  distance  measure  can  be  accomplished  quite  readily.  Consider  the  following 
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measure,  often  termed  the  Mahalanobis  or  statistical  distance: 

=  ln|E/|  +(x-ji,)^S7'(x-ft/)  (4.2.17) 

A  development  identical  to  that  leading  to  equation  4.2.14  results  in  the  following 
classification  rule: 

xewi  if  dMi^,  <  ^m(x,  for  all  i  j  (4.2. 1 8) 

which  represents  the  desired  Gaussian  maximum  likelihood  classifier.  As  compared  to 
the  previous  Euclidean  distance  measure  classifier  (equation  4.2.16)  note  that  this 
algorithm  defines  an  n-dimensional  hyperellipsoid  in  feature  space.  This  is  accomplished 
by  the  "space  scaling"  effect  of  the  variance-covariance  matrix. 

It  is  generally  desirable,  though  computationally  intensive,  to  account  for  the 
dispersion  of  the  training  data.  The  validity  of  this  statement  can  be  visualized  readily. 
Consider  the  identical  two-dimensional  feature  spaces  in  figures  4.2.3a  and  4.2.3b.  The 
information  about  the  covariance  of  the  cluster  is  not  utilized  in  the  first  example 
(Euclidean  distance  measure  and  minimum-distance-to-the-means),  but  is  employed  in 
the  second  case  (Gaussian  maximum  likelihood  employing  the  Mahalonobis  distance). 
The  pixel  to  be  classified  is  represented  by  the  cross  near  the  middle  of  the  feature  space. 
In  Figure  4.2.3a,  the  pixel  of  interest  would  be  assigned  to  class  one  as  its  vector  is  closer 
to  Pi  in  a  Euclidean  sense.  By  taking  into  account  the  inherent  dispersion  of  the  data,  as 


band  x  band  x 

Figure  4.2.3a  Figure  4.2.3b 

hyperspheres  resulting  from  hyperellipsoids  resulting  from  GML 

minimum  distance  to  the  means 


depicted  in  Figure  4.2.3b,  the  GML  classifier  would  assign  the  pixel  to  class  two.  Note 
that  this  effect  is  most  easily  explained  by  realizing  that  the  information  contained  within 
the  variance-covariance  matrix  essentially  scales  feature  space  differently  in  different 
directions  thereby  accoxmting  for  the  data's  dispersion. 

As  previously  mentioned,  Gaussian  maximum  likelihood  always  assigns  a  pixel  to 
one  of  the  target  classification  classes.  This  can  be  undesirable  especially  in  the  case 
where  a  particular  pixel  is  below  some  threshold  for  membership  in  any  target  class,  and 
it  would  be  most  advantageous  to  assign  this  pixel  to  a  "background"  or  "other"  class. 
Recall  the  Mahalanobis  distance  measure  as  defined  in  equation  4.2.17: 

=  In  IS/ 1  +  (x  -  (x  -  pi)  (4.2. 19) 

and  its  associated  classification  rule  equation  4.2.18: 

X  G  w/  if  J/w(x,  p/)^  <  ifM(x,  py)^  for  all  i  j  (4.2.20) 

We  desire  to  incorporate  a  threshold  value  into  this  measure  to  describe  the  level  of 
certainty  to  be  attained  prior  to  assigning  a  pixel  to  a  given  class.  Mathematically,  this 
can  be  expressed  as: 

X  e  W/  if  diJ^,  p/)^  <  ifA/(x,  p/)^  for  all  i  j 

andi7M(x,p/)2<r/  (4.2.21) 

where  T/  is  the  threshold  value  to  attain  membership  in  class  i.  It  can  be  shown  (Johnson 
and  Wichern,  1992)  that  for  a  Z)-dimensional  multivariate  normal  distribution,  the  vectors 
have  a  (chi  squared)  distribution  if  the  hyperellipsoids  of  vectors  satisfy  the  following 
relationship: 

(x  -  p/)^Z7'  (x  -  pj)  ^  Ti  (4.2.22) 

This  distribution  has  D  degrees  of  freedom  and  a  probability  of  1-a.  In  a  more  compact 
notation  this  is  typically  represented  as  Xd(«)  Utilizing  this  result  to  our  distance 
measure  results  in  the  following  classification  algorithm: 

X  e  Wi  if  <7m(x,  p/)^  <  Jm(x,  py)^  for  all  i  j 

and  ^/m(x,  p/)^  <  xlia)  (4.2.23) 
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Intuitively,  equation  4.2.23  is  comforting.  As  the  value  of  a  is  decreased,  thereby 
increasing  the  probability  that  a  particular  spectral  intensity  vector  is  within  the 
hyperellipsoid  of  a  given  class,  the  Xz)(ct)  distribution  returns  larger  and  larger  values. 
This  can  be  visualized  as  the  hyperellipsoid  swelling  or  inflating  in  feature  space  to 
encompass  a  greater  and  greater  volume. 
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The  nPDF  algorithm  was  codeveloped  at  the  Department  of  Earth  and 
Atmospheric  Sciences  at  Purdue  University  by  Haluk  Cetin  and  Donald  W.  Levandowski. 
This  method  utilizes  what  are  known  as  frequency  perspective  plots  to  allow  the  display 
of  multi-dimensional  data  on  two-dimensional  display  devices  (such  as  a  CRT),  to  reduce 
data  dimensionality  in  a  manner  similar  to  the  Karhunen-Loe  ve  transformation,  to 
classify  the  multidimensional  data  in  either  a  supervised  or  unsupervised  manner,  and  to 
perform  cluster  analysis  prior  to  classification.  In  essence,  the  algorithm  projects 
n-dimensional  data  onto  a  two-dimensional  plane  through  the  use  of  two  distance 
measurements.  The  straightforward  mathematical  development  that  follows  will  center 
on  the  derivation  of  the  projection  technique  and  how  the  resulting  projection  can  be 
utilized  for  classification.  The  development  presented  here  essentially  follows  that 
presented  by  Cetin  and  Levandowski  (1991). 


4.3.1  Mathematical  Development  of  nPDF 


As  previously  mentioned,  the  nPDF  algorithm  essentially  projects  n-dimensional 
data  onto  the  two-dimensional  plane  where  considerably  simplified  classification 
techniques  can  be  applied.  In  a  two-dimensional  feature  space,  a  feature  vector  is  defined 


A  simple  way  to  uniquely  describe  the  position  of  the  point  (x  i ,  X2)  in  this  feature  space 
is  as  the  intersection  of  two  arcs  originating  from  two  reference  points  or  "comers". 
Consider  the  diagram  in  Figure  4.3.2  where  band  1  and  band  2  are  two  bands  of  interest 
in  the  multispectral  data.  The  range  of  the  axes  is  assumed  to  be  256  (2®  as  per  8-bit  TM 
data).  The  magnitudes  of  the  radii  of  the  arcs  are: 
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(4.3.1) 


d\  =  Jx]+xl 

d2  =  J{R-xi)^  +x\  (4.3.2) 

where  R  represents  the  maximum  value  of  the  data  (255).  By  extending  this  concept  to  a 
three-dimensional  feature  space,  Figure  4.3.2  is  easily  obtained. 


Figure  4.3.1 

a  two-dimensional  feature  space 


Figure  4.3.2 

three-dimensional  vector  and 
its  feature  space  representation 


As  previously  defined,  the  distances  dj  and  d2,  from  comers  C,  and  C2  to  the  point  of 


interest  are: 

d\  =  . 

Jxi  +x\  +x\ 

(4.3.3) 

d2  =  < 

J  x\  +  X2  "1"  (R  ^^3) 

(4.3.4) 

Complimentary  equations  for  the  other  distances  from  the  other  reference  comers  of  the 
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feature  space  can  be  easily  derived  following  this  same  process.  This  process  may  be 
extended  to  higher  dimensions  where  the  feature  vector  is: 


f  Xl 


X2 


and  the  distance  measures  can  be  genereilized  to: 

di  =  (xf  *  aj  +  {R-Xjf  *  (4-3 -5) 

where  i  is  the  reference  "comer  number"  of  the  hypercube,  j  is  the  band  number,  n  is  the 
number  of  bands,  and  a,  and  bj  are  values  necessary  to  calculate  the  distances  to  the 
comers  of  the  n-dimensional  hypercube.  The  values  of  Uj  and  bj  necessary  to  calculate 
the  di  through  d4  measurements  on  the  LANDSAT  TM  six-dimensional  data  are 
summarized  in  table  4.3.1. 


Table  4.3.1  -  values  of  a^  and  bj 
d]  d2  d3  d4 


TM  Band  aj  bj  ^  bj  aj  bj  aj  bj 


1 

1 

0 

1 

0 

1 

0 

1 

0 

2 

1 

0 

1 

0 

0 

1 

0 

1 

3 

1 

0 

0 

1 

1 

0 

0 

1 

4 

1 

0 

1 

0 

1 

0 

1 

0 

5 

1 

0 

1 

0 

0 

1 

0 

1 

7 

1 

0 

0 

1 

1 

0 

0 

1 

With  these  distance  measures  developed,  we  are  now  ready  to  derive  the  rest  of 
the  nPDF  components.  Consider  the  following  equation: 


nPDFi  =  S’^ 


dj 


(4.3.6) 
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where  4  is  the  distance  from  the  "zth"  comer  to  the  point  of  interest  as  previously  defined, 
BIT  is  the  number  of  bits  in  the  input  data  (8  for  LANDS  AT  TM),  NB  is  the  number  of 
bands  being  utilized  (6  for  LANDSAT  TM),  and  5  is  a  scale  factor.  Note  that  the  ratio 
term  in  equation  4.3.6  returns  a  value  between  zero  and  slightly  less  than  one. 
Multiplication  by  the  scale  factor  allows  the  frequency  perspective  plot  to  be  stretched  so 
that  finer  details  in  the  plot  may  be  observed.  The  nPDF  components  are  utilized  in  pairs 
to  produce  the  nPDF,  or  frequency  perspective,  plots.  By  our  previous  definitions,  6 
plots  are  possible  utilizing  the  different  distance  measures  arising  from  the  following 
unique  combinations:  C]-C2,  C1-C3,  C1-C4,  C2-C3,  C2-C4,  and  C3-C4.  Utilization  of 
different  principal  comer  pairs  produces  differing  nPDF  plots.  By  choosing  different 
comer  pairs,  the  separation  between  desired  target  classes  may  be  increased  to  yield 
increased  classification  accuracy. 

To  illustrate  the  procedure  of  creating  the  nPDF  plot  and  the  ensuing 
classification  process,  a  synthetic  3 -dimensional  training  set  consisting  of  4  normally 
distributed  classes  was  created.  The  data  set  ranges  in  value  from  0  to  3 1 .  As  previously 
defined,  R  is  set  equal  to  31,  BIT  is  set  equal  to  5  (because  2’  =  32),  NB=3,  and  5=64. 

The  scale  factor  dictates  the  size  of  the  resulting  nPDF  plot,  and  in  this  case,  a  64-by-64 
plot  will  be  created.  Because  of  this  a  64-by-64  array  must  be  allocated  and  initialized  to 
contain  all  zeros  prior  to  any  further  calculations.  The  nPDF  plot  is  most  simply 
considered  to  be  a  height  field  and  the  corresponding  array  will  be  referenced  by 
height_field[row][col].  The  following  algorithm  in  pseudocode  will  produce  an  nPDF 
plot  for  the  synthetic  data  set  utilizing  the  d,  and  d4  measurements: 
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for  row  =  1  to  num_rows_in_image 

for  col  =  1  to  num_cols_in_image 

d\  =  (xy  *  +  {R-xjf  * 

d4  =  (xf  *  a4j  +  (R-Xjf  *  64,.) 

npdfx  =5* 

npdf4=S^J^ 

height_field[«p(^][«/?f^]  =  height_field[«/7fi^][«/>(^]  +  1 
next  col 
next  row 

where  ^  and  bj  reference  the  components  of  the  following  vectors  as  defined  in  table 
4.3.1: 


a\  = 

rn 

1 

bx  = 

0  0 

V 

QAt  = 

rn 

0 

11 

^0  ' 

1 

V  1  y 

loj 

loj 

<  1  > 

Note  that  the  algorithm  essentially  "counts  up"  the  number  of  data  elements  (or  pixels)  in 
the  training  set  that  share  the  same  npdf,  and  npdf4  values  and  "bins"  them  together.  This 
is  accomplished  by  rounding  off  the  calculated  nPDF  value  to  the  nearest  integer  and 
using  this  value  as  the  index  for  the  height  field  array  which  is  then  sequentially 
incremented.  This  process  produces  the  height-field  effect  as  previously  discussed.  The 
synthetic  training  data  set  was  created  with  the  four  normally  distributed  classes,  each 
with  unit  variance,  and  the  following  mean  vectors: 


'28  ^ 

'  16  ^ 

^20 ' 

m\  - 

28 

I28  J 

m2  = 

4 

UJ 

m3  = 

4 

^  4  ; 

m4  = 

20 

lio  J 
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Table  4.3.2  lists  the  nPDF  values  for  the  mean  vectors: 


Table  4.3.2  -  values  of  npdf,  and  npdf,  for  the  synthetic  data 


class 

npdf. 

npdf, 

1 

56 

33 

2 

8 

44 

3 

20 

48 

4 

35 

36 

We  expect  to  see  concentrations  of  pixels  scattered  around  these  coordinates  in  the  nPDF 
plot.  The  height  field  and  contour  diagrams  below  were  created  by  applying  the  nPDF 
algorithm  to  the  synthetic  data  and  utilizing  the  d,  and  d4  measurements. 

Examining  the  plots  confirms  the  previous  conjecture  as  spikes  in  the  height  field  and 
"blobs"  in  the  contour  plot  are  located  at  the  previously  calculated  nPDF  coordinates  for 
the  centers  of  the  training  class  distributions. 


Figure  4.3.3  -  nPDF  height-field  and  contour  plot 
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Once  the  location  of  the  class  mean  vectors  are  calculated  in  the  nPDF  space,  an 
image  classification  algorithm  very  similar  to  the  minimum-distance-to-the-means  can  be 
easily  developed.  Consider  the  following  image  classifier  logic: 

X  G  Wi  if  dnPDfi^,  <  dnPDF(x,  Py)^  for  all  i  j  (4.3.7) 

where  d„PDF(i,  Vaf  =  (x  -  jl/) ^(x  -  p/).  This  is  identical  to  the  Euclidean  distance 
measure,  with  the  important  exception  that  this  calculation  need  be  performed  only  in  the 
two-dimensional  nPDF  feature  space.  While  there  is  considerable  computational  savings 
in  performing  this  measure  in  the  space  with  reduced  dimensionality,  recall  that  the 
intensive  projection  calculations  must  be  completed  for  every  pixel  in  the  image  prior  to 
its  classification.  As  intuitively  expected,  the  performance  of  this  image  classifier  is  very 
similar  to  the  multidimensional  minimum-distance-to-the-means  algorithm,  and  is  fraught 
with  the  same  inability  to  account  for  data  dispersion  in  the  training  classes. 

After  calculating  the  nPDF  plot  for  the  training  classes,  classification  of  the  entire 
image  can  be  accomplished  in  a  unique  maimer.  Consider  the  previously  calculated 
contour  plot  that  has  been  arbitrarily  segmented  as  shown  in  Figure  4.3.4.  The  regions 

marked  Rj  through  R4  correspond  to  the 
previously  defined  target  classes.  This 
mapping  can  be  utilized  as  a  lookup  table 
(LUT)  if  we  fill  the  array  elements 
corresponding  to  a  particular  polygon  with  the 
numerical  value  of  that  class  (a  1,  2,  3,  or  4  in 
this  simple  example).  The  npdfj  and  npdf, 
values  are  calculated  for  each  pixel  in  the 
original  image.  With  these  values,  we  simply 
look  up  the  target  classification  class  value 

hfield 

Figure  4.3.4  -  nPDF  contour  plot  with  and  assign  the  pixel  to  it. 
overlayed  classification  boundaries 
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This  method  of  supervised  classification,  where  a  human  specifies  the 
classification  boundaries  in  a  projection  of  feature  space,  has  some  notable  advantages. 
Foremost  of  these  is  that  the  classification  performance  depends  entirely  on  the  users 
selection  of  bounding  polygons,  which  can  be  readily  modified  to  account  for  subtle 
shape  fluctuations  in  the  data  distribution  that  could  be  extremely  difficult  to  model 
mathematically.  Complicated  and  time  consuming  calculations  of  the  multi-dimensional 
bounding  volume  or  distance  measures  need  not  be  completed  as  the  simplified 
two-dimensional  boundaries  can  be  implemented  as  LUTs  in  the  nPDF  space.  Also  no 
difficult  statistical  calculations  are  required  to  determine  if  a  pixel  should  be  assigned  to  a 
"background"  or  "other"  class  as  is  necessary  in  GML  classification.  The  most  notable 
disadvantage  of  this  method  is  the  subjective  placement  of  the  boundaries  which  are 
difficult  to  reproduce  accurately. 

The  nPDF  algorithm  also  has  considerable  utility  for  estimating  the  number  of 
target  classes  present  in  a  data  set.  This  information  is  very  useful  in  cluster  analysis,  and 

is  at  the  heart  of  unsupervised  classification 
methodologies.  Illustrating  this  facet  of  the 
algorithm  is  also  quite  simple.  If  the  nPDF 
plot  is  computed  for  the  entire  image,  we 
would  expect  that  masses  of  pixels  with 
similar  spectral  intensity  vectors  would 
generate  high  peaks  in  the  plot.  By  simply 
counting  the  number  of  these  spikes,  a 
reasonable  estimate  of  the  number  of  target 
classification  classes  K  can  be  made.  This 
process  is  illustrated  in  Figure  4.3.5,  which 

Figure  4.3.5  -  nPDF  height-field  plot 

r. .  ,  .  .  r  ■  was  created  by  calculating  an  nPDF  plot  for  an 

of  4  clusters  m  the  presence  of  noise  j  b  t' 


image  hfield 


"entire  image"  of  the  synthetic  data  set,  to  which  some  random  "background"  noise  was 
added.  The  random  noise  appears  as  small  data  fluctuations  at  the  base  of  the  diagram 
while  the  four  peaks  represent  the  major  data  clusters  in  the  data  set. 
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4.4  Fuzzy  ARTMAP  Neural  Network 


A  review  of  the  current  literature  in  the  field  of  remote  sensing  will  indicate  that 
many  neural-network  architectures  have  been  utilized  for  classifying  image  data.  The 
chief  advantage  of  utilizing  neural  networks  for  classification  arises  from  the  fact  that  a 
single  flexible  learning  algorithm  is  able  to  derive  an  optimal  decision  rule  for  a  given 
situation  from  the  training  data.  The  chief  problems  of  this  family  of  algorithms  are  long 
learning  times  and  the  inability  to  deal  with  "fuzzy"  data.  The  term  "fuzzy"  relates  the 
concept  of  the  extent  to  which  a  given  feature  is  present  in  an  exemplar.  Terms  such  as 
"close  to  home",  "much  greater  than  five",  and  "quite  young",  are  linguistic  examples  that 
portray  this  concept.  Human  beings  are  naturally  fuzzy  in  their  decision-making 
processes.  While  it  is  quite  simple  for  a  graduate  student  to  label  a  given  instructor  as  a 
"good  teacher",  it  has  traditionally  been  impossible  to  provide  a  computer  with  the  tools 
to  make  this  same  type  of  classification.  Neural  networks  are  typically  implemented  on 
general-purpose  computer  hardware  and  are  hindered  by  their  inherent  binary  nature.  In 
the  mid  1960's,  L.  A.  Zadeh  derived  the  mathematical  groimdwork  for  a  field  of  study 
that  would  become  known  as  fuzzy  logic.  Armed  with  these  new  tools,  engineers  and 
researchers  have  utilized  these  concepts  to  permit  the  machine  computation  of  fuzzy  data. 

An  advanced  neural  network,  the  fuzzy  ARTMAP,  attacks  both  traditional 
shortcomings  of  neural  networks  in  a  very  direct  manner.  Conventional  neural  network 
methodologies  typically  require  training  data  to  be  applied  repeatedly,  often  thousands  of 
times,  to  determine  a  set  of  decision  rules  that  produce  a  desired  level  of  performance. 

By  comparison,  the  Adaptive  Resonance  Theory  and  MAPping  (ARTMAP)  approach  can 
perform  the  same  classification  to  the  same  level  of  accuracy  after  only  a  handful  of 
training  cycles,  or  epochs.  The  introduction  of  fuzzy  logic  to  this  family  of  algorithms 
greatly  increases  their  flexibility  because  features  that  previously  could  only  be  said  to  be 
present  or  absent  can  now  be  more  completely  described.  When  dealing  exclusively  with 
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binary  data,  the  algorithm  reduces  to  its  crisp  set  theory  predecessors.  The  overview  in 
section  4.4  essentially  summarizes  that  of  Carpenter  et  al.,  (1992). 

4.4  Overview  of  fuzzy  ARTMAP  Neural  Network  Architecture 

Consider  the  overall  system  diagram  for  the  neural  network  of  interest: 


Figure  4.4.1  -  Fuzzy  ARTMAP  Architecture 

This  neural  network  is  not  as  complex  as  it  may  appear.  The  network  must  be  trained 
before  the  LANDSAT  TM  images  can  be  classified.  This  is  accomplished  by  applying 
the  spectral  intensity  vectors  of  the  training-class  pixels  at  point  a,  while  the 
corresponding  label  or  target  class  will  be  applied  simultaneously  at  point  b.  An 
important  calculation,  known  as  complement  coding,  takes  place  on  the  input  data  in  the 
preprocessing  fields  Fo^and  Fq’’.  Initially  the  input  data,  (represented  here  as  the  vector 
I),  must  have  its  elements  scaled  between  0  and  1.  Mathematically  we  can  describe  the 
rescaling  operation  as: 
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a;-min 


(4.4.1) 


1  = 

max~min 


Where  a,-  represents  an  individual  element  of  the  intensity  vector,  and  max  and  min 
denote  the  maximal  and  minimal  data  values.  The  rescaling  operation  can  be 
accomplished  with  either  global  or  local  maxima  and  minima.  In  this  study,  to  account 
for  all  possible  occurrences,  the  minimum  digital  count  value  will  be  set  to  0  and  the 
maximum  value  to  255.  Note  that  the  rescaling  operation  preserves  amplitude 
information  and  provides  a  very  useful  mathematical  view  of  the  world.  Consider  the 
following  pixel  intensity  vector  and  its  normalized  variation,  I  norm  ■ 


’  80 

’  0.314  ‘ 
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0.078 

144 

_ 
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I  = 

36 

inorm  ” 

0.141 

71 

0.278 

54 

0.212 

The  complement  of  the  input  vector  must  also  be  calculated.  This  can  be  mathematically 
defined  as: 


a‘l=\-ai 

and  we  redefine  I  =  (a,  a‘^) 

Accomplishing  the  complementation  operation  on  the  input  pixel  results  in: 


0.314 

0.686 

0.078 

0.922 

0.565 

0.435 

0.141 

0.859 

0.278 

0.722 

0.212 

0.788 

(4.4.2) 

(4.4.3) 


The  data  at  the  input  field  of  ART^,  F,%  is  now  a  normalized  and  complemented  vector 
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with  2N  elements,  where  N  is  the  number  of  bands.  A  value  representing  the  target  class 
with  2M  elements,  where  M  is  the  number  of  target  classes,  is  similarly  represented  as  a 
vector  in  input  field  F,*’.  The  target  class  data  label  for  this  field  must  be  encoded  in  a 
binary  manner.  If  the  previously  defined  training  class  examples  are  used,  we  will 
encode  grass  as  0001,  water  as  0010,  pine  trees  as  0100,  and  deciduous  trees  as  1000. 
Their  complements  obviously  are  1110,  1101,1011  and  0111,  respectively.  As 
previously  mentioned,  the  fuzzy  ARTMAP  reduces  to  a  crisp  set  theory  implementation 
in  the  presence  of  binary  data.  It  is  interesting  to  note  that  in  this  study  that  the  ARTa 
module  will  be  utilizing  fuzzy  logic,  while  its  linked  sister  module  will  perform  similar 
calculations  using  traditional  crisp  set  theory  mathematics. 

The  adjustment  of  the  weights  that  provide  the  system's  long-term  memory  will 
now  be  discussed.  As  in  a  regular  ART  network,  all  nodes  in  the  input  field  are  fully 
interconnected,  and  the  F,  activity  vector  will  be  represented  by:  x  =  (xi ,  ...X2n)- 
Similarly,  the  classification  vector,  or  field  activity  vector  Fj ,  will  be  represented  by; 
y  =  (yi  ....>>2^)  •  Note  that  the  number  of  nodes  in  either  field  can  be  arbitrary,  but  there 
must  be  more  nodes  in  the  classification  field  than  classification  categories.  We  will  also 
define  the  weight  vector  (or  long  term  memory  trace)  between  the  ’yth"  node  of  FjUnd  all 
nodes  in  the  input  field  as:Wy  =  {Wj\,....WjiN)  .  Note  that,  while  the  crisp-set  theory 


Figure  4.4.2  -  depiction  of  the  weight  vector  wj 
between  the  input  field  (F®)  and  classification  field  (F2) 
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implementation  of  the  ARTMAP  algorithm  has  both  a  bottom-up  and  a  top-down  weight 
vector,  the  fuzzy  ARTMAP  has  only  this  single  weight  vector  for  each  node.  Initially 
this  weight  vector  has  all  elements  set  to  unity.  Figure  4.4.2  illustrates  the  dynamics 
between  the  F,  and  F2  fields. 

As  in  a  regular  adaptive  resonance  theory  network,  the  role  of  the  classification 
field  is  to  determine  a  "winning"  node  and  pass  this  information  to  the  rest  of  the  system. 
In  the  fuzzy  ARTMAP,  this  category  choice  is  accomplished  through  the  use  of  fuzzy 
mathematics.  Consider  the  following  category  choice  function: 


Tj(l)  = 


Iaw/ 


a+  w/ 


(4.4.4) 


where  ”1.1”  denotes  the  norm  operator,  which  is  defined  as: 

|a|  a, 

The  a  term  is  a  choice  parameter  which  is  set  to  be  greater  than  zero.  The  fuzzy  AND 


(4.4.5) 


operator "  A"  is  a  relative  of  the  logical  AND  operator  and  simply  returns  a  vector  whose 
elements  are  the  minimum  value  of  the  corresponding  elements  in  the  set  of  vectors  being 
processed.  This  "min"  operation  is  easily  defined  as: 

(P  A  q)i  =  niin(p/,  q,)  (4.4.6) 

Mathematically  speaking,  equation  4.4.4  determines  the  degree  to  which  the  weight 
vector  is  3l  fuzzy  subset  of  the  input  vector.  Note  that  when  the  weight  and  input  vectors 
are  very  similar,  (I  a  wy)  =  Wy  and  2}  will  approach  unity,  implying  that  node  J  is  the  best 
choice.  A  category  choice  is  made  when  only  one  F2  node  becomes  activated.  A  node  is 
said  to  have  become  activated  when  it  has  a  value  of  one  and  all  others  have  a  value  of 
zero.  This  operation  is  often  referred  to  as  "winner  take  all".  The  chosen  node  index  J  is 
governed  by  the  following  simple  relation: 


J=max{Tj(I)  :  j  =  1,2...2Z} 


(4.4.7) 
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In  the  event  of  a  tie,  the  node  with  the  lowest  numerical  value  is  chosen.  When  the  signal 
from  the  input  and  classification  fields  reinforce  one  another,  resonance  occurs.  This 
process  occurs  when: 

|Iaw,|  ^ 

^  >  p  (4.4.8) 

where  p  is  the  vigilance  parameter.  As  p  is  increased,  the  algorithm  has  to  be  more  and 
more  certain  of  the  of  assignment  of  a  particular  exemplar  to  a  specific  class  before 
classification  is  executed.  This  is  exactly  as  mathematically  stated  in  equation  4.4.8. 
Technically,  the  equation  returns  a  measure  of  the  degree  to  which  the  input  vector  is  a 
fuzzy  subset  of  the  weight  vector.  More  intuitively  put,  note  that  the  proportion  only 
becomes  large  (near  unity)  when  the  input  and  weight  vectors  have  similar  values.  This 
is  exactly  why  the  weight  vector  is  also  termed  a  "long-term  memory  trace".  The  weight 
vector  of  each  node  essentially  "learns"  (defines)  a  multidimensional  region  that  is 
populated  solely  by  vectors  from  one  target  class.  After  the  network  is  trained,  new  input 
vectors  located  within  the  bounds  of  this  region  are  assigned  to  that  class.  Whenever  an 
input  pixel  is  determined  to  reside  within  the  bounds  of  or  close  to  a  target  region  as 
determined  by  the  vigilance  parameter,  resonance  will  occur  between  the  input  and 
classification  fields  and  the  same  Fj  node  will  be  activated.  Learning  will  occur  at  this 
point,  and  this  process  is  governed  by  the  relation: 

=  P(j  A  +  (1  -  p)w;"  (4.4.9) 

Note  that  when  the  weight  vector  is  essentially  a  fuzzy  subset  of  the  input  vector,  then 
I  A  wy  s  wy  and  little  "learning"  will  occur.  This  is  desirable  and  leads  to  enhanced 

stability.  The  parameter  P  is  termed  a  learning-rate  parameter  and  is  bounded  between 
zero  and  one.  For  fast  learning,  P  »  1 .  This  makes  the  weights  closely  track  the  input 
vectors,  and  is  useful  when  initially  training  the  network.  For  fast  encoding  of  noisy  data 
sets  (the  typical  real-world  situation)  the  learning  parameter  is  set  to  1  until  a  node 
becomes  initially  activated.  When  a  node  is  initially  assigned  to  represent  a  target 
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classification  class,  it  is  said  to  have  been  committed.  Once  a  node  has  been  committed, 
the  value  of  P  is  reduced.  Though  not  inherently  obvious  at  this  point,  it  is  important  to 
note  that  the  elements  of  the  weight  vectors  can  only  decrease  monotonically.  This  is 
directly  related  to  classification  boundaries  expanding  in  feature  space.  This  important 
facet  of  the  algorithm  will  be  highlighted  in  a  simple  two-dimensional  and  two-class 
example  to  follow. 

If  a  pixel  is  determined  not  to  fall  within  or  close  to  a  particular  classification 
region,  resonance  will  not  occur.  The  degree  of  "closeness"  to  the  classification  region  is 
user  definable  through  the  vigilance  parameter.  Resonance  does  not  occur  when  the 
vigilance  criterion  (equation  4.4.8)  is  not  met.  This  occurs  when: 

11^  ^  p  (4.4.10) 

When  this  happens,  mismatch  reset  is  said  to  have  occurred,  and  a  search  operation 
begins.  The  algorithm  determines  if  the  pixel  is  close  to  or  within  any  of  the  regions 
defined  by  the  weight  vectors  of  the  F2  nodes.  If  no  suitable  region  can  be  found,  another 
F2  node  can  be  committed  or  a  new  node  can  be  created.  When  a  new  node  is  created, 
another  classification  region  is  defined  internally.  Once  resonance  is  attained,  learning 
can  be  accomplished  by  altering  the  node's  weight  vector  as  with  the  learning  rule  as 
previously  described. 

For  insight  into  how  this  algorithm  divides  up  feature  space,  consider  the 
following  two-dimensional  example.  Given  a  general  feature  vector,  the  preprocessing, 
or  fields  Fq  ,  will  produce  the  normalized  four-dimensional  vector: 

I  =(a,a0  =  («i,«2,  l-«i,l-«2)  (4.4.11) 

As  expected,  the  weight  vector  will  attempt  to  map  these  input  values  to  a  region  of 
feature  space  through  application  of  the  learning  rule.  The  weight  vector,  in  complement 
coded  form,  can  then  be  expressed  as: 

Wj  =  (uj,Vj)  (4.4.12) 
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Geometrically  speaking,  the  weight  vector  can  be  thought  of  as  describing  a  rectangle  in 
feature  space.  Consider  the  diagram  in  Figure  4.4.3.  We  will  assign  the  name  Rjto  the 
region  near  the  center  of  feature  space.  A  measure  of  its  size  can  be  computed  by  adding 
its  height  and  width  as  follows: 

\Rj\^\v,-uj\  (4.4.13) 

During  learning,  the  region  Rj  grows  as  the  elements  of  its  associated  weight  vector 

decrease  monotonically.  Consider  the  case  where  a  new  pixel  (denoted  as  a)  meets 
category  choice  criteria  for  this  same  region.  This  will  cause  the  very  same  classification 
field  node  to  become  active,  and  the  elements  of  the  weight  vector  will  begin  to 
monotonically  decrease  during  learning.  The  algorithm  seeks  to  determine  the 
minimum-sized  rectangle  in  feature  space  that  will  encompass  all  of  the  input  vectors  that 
cause  this  node  to  become  active.  At  this  point,  we  will  introduce  the  fuzzy  OR  operator 
(denoted  by  the  "v" ).  As  might  be  expected,  it  is  a  relative  of  the  crisp  OR  operator. 
Unlike  the  fuzzy  AND  operator,  this  process  returns  the  maximum  value  of 
corresponding  elements  of  the  vectors.  In  feature  space,  the  previously  defined  rectangle 
has  enlarged  (Figure  4.4.4)  to  reflect  the  decrease  in  the  weight  vector's  elements  and  the 
assignment  of  the  new  data  point  to  this  class.  The  weight  vector  associated  with  this 
new  region  is  then: 

wy  =  (a  A  Uj,  (a  V  Vjf)  (4.4. 14) 


Figure  4.4.3  -  weight  vector  represented  Figure  4.4.4  -  region  Rj  has  ’’grown” 
as  rectangle  Rj  in  feature  spac  to  include  vector  a 
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It  is  important  to  note  that  the  weight  vector  essentially  has  "learned"  the  lower  left  and 
upper  right  comers  of  the  new  region.  As  already  defined,  the  size  of  this  new  region  is: 

\Rj  ©  a|  =  |(a  V  vy)  -  (a  A  Uj)\  (4.4.15) 

and  the  ©  operator  implies  a  bounded  sum  which  cannot  exceed  some  value.  This 
process  must  be  bounded  by  the  vigilance  parameter  to  prevent  the  entire  feature  space 
from  being  enclosed  by  a  single  rectangle: 

li?yl<(l-p)A  (4.4.16) 

where  N  is  the  number  of  bands  or  dimensions  being  used  (2  in  this  case).  Note  that  as 

the  vigilance  parameter  p  is  increased,  the  feature  space  is  segmented  by  smaller  and 

smaller  rectangles  and  thus  becomes  more  and  more  finely  granularized. 

In  general,  for  an  M-dimensional  vector  and  given  A,  the  set  of  all  vectors  that 
activate  the  node  of  interest,  the  following  fuzzy-set  statements  can  be  made  concerning 

the  class'  hyper-rectangle  in  feature  space.  The  vertices  of  the  hyper-rectangle  are 
represented  by  the  minimum  and  maximum  values  of  the  individual  elements  that 
encompass  the  vectors  that  define  class  A.  Mathematically,  this  is  simply: 

Uj=(Aj  A),  =  min{Ai}  and  similalry,  vj  =  (vj  A),-  =  max{A,}  (4.4.17) 

Recall  that  in  the  preceding  two-dimensional  case,  Uj  and  y,  were  the  lower  left  and  upper 
right  comers  respectively.  It  follows  that  the  size  as  defined  in  equation  4.4.13  of  the 
corresponding  hyper-rectangle  is: 

|i?y|  =  |vyA-AyA|  (4.4.18) 

The  corresponding  weight  vector  can  therefore  be  expressed  as: 

Wy  =  (AyA,(VyA)0  (4.4.19) 

Applying  the  norm  operator  to  this  results  in: 

|wy  1  =  I, (Ay  A)  +  S,(l  -  (Ay  A)  =  M-  |vy  5  -  Ay  a|  (4.4.20) 

Substituting  into  the  size  equation  (4.4.13)  for  the  hyper-rectangle  yields: 

lRy|=M-|wy|  (4.4.21) 

Realizing  that  |wy  |  >  piV ,  we  obtain  the  very  useful  realization: 
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|i?y|  <(l-p)A^  (4.4.22) 

which  was  utilized  in  the  preceding  example  (eq.  4.4. 16)  without  the  benefit  of  this  proof. 

It  is  worth  mentioning  again  that  this  important  result  states  that  increasing  the  vigilance 
parameter  p  results  in  a  decrease  in  the  size  of  the  corresponding  target  classification 
hyper-rectangles  decrease  in  size.  This  produces  a  finely  partitioned  feature  space  with 
many  classification  hyper-rectangles.  In  feature  space,  this  entire  process  basically  learns 
an  exciting  variation  of  the  classical  stacked  parallelepiped  classification  algorithm 
(Richards,  1993).  The  most  important  difference  between  the  two  implementations  is 
that  the  fuzzy  ARTMAP  implementation  permits  what  is  known  as  exception  handling, 
where  separately  recognizable  subclasses  are  defined  within  larger  classes.  This  aspect  of 
the  algorithm  is  depicted  in  Figure  4.4.5. 


Figure  4.4.5  -  stacked  classification  rectangles 
with  exception  handling  in  feature  space 

Illustration  of  a  two-dimensional  two-class  example  would  demonstrate  the  utility 
of  the  previously  defined  category  choice,  vigilance  criteria,  and  leaming/weight 
adjustment  rules.  The  two  classes  in  this  example  will  be  water  and  pine  trees  and  only 
the  "blue"  and  "green"  bands  will  be  used.  Consider  pixels  with  the  following  values: 


class 

blue  digital  counts 

green  digital  counts 

water 

230 

26 

pine 

61 

191 

pine 

71 

204 
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After  normalization  and  complement  coding,  the  pixels  are  represented  by  the  following 
vectors: 

class  vector  normalized  blue  normalized  green  complement  of  complement  of 


digital  counts 

digital  counts 

normalized  blue 

normalized  green 

water 

I, 

0.9 

0.1 

0.1 

0.9 

pine 

I2 

0.24 

0.75 

0.76 

0.25 

pine 

I3 

0.3 

0.8 

0.7 

0.2 

We  will  assume  that  the  network  is  being  initially  trained.  Category  choice,  or  which 
classification  node  will  become  activated,  is  given  by: 

rxi)  = 

Recall  that  all  of  the  initial  elements  of  the  weight  vector  have  a  value  of  unity,  and  that 
the  choice  parameter  a,  is  some  small  positive  number.  This  results  in: 

rxi  i)  =  =  1  for  j=l  to  2L  (4.4.24) 

which  means  that  node  one  of  the  classification  field  will  become  active  as  the  lowest 


numerical  value  of  j  is  always  chosen.  Resonance  occurs  when  the  input  and 
classification  fields  reinforce  one  another.  This  occurs  when: 


|Iawi| 

-]ip  a  P  (4.4.25) 

We  will  choose  to  set  the  vigilance  parameter  p  =  0.9.  Since  the  node  was  previously 


xmcommitted,  resonance  will  occur  because: 

\MA  =  1  >  p  (4.4.26) 

li|  ^ 

Learning  will  now  ensue.  As  previously  stated,  it  is  governed  by  the  learning  rule 
(equation  4.4.9).  Given  that  we  are  in  fast  learning  mode  (P  =  1)  this  results  in  the 
following  calculation  and  weight  vector: 

=  pa  1  A  ^f)  +  (1  -  p)wf^  =  [0.9, 0. 1 , 0. 1 , 0.9]  (4.4.27) 
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Note  that  this  is  exactly  the  input  vector!  If  we  perform  similar  calculations  for  the  first 
pine  tree  pixel,  we  will  find  that  resonance  will  not  occur  on  the  first  node  due  to  the 
value  of  the  vigilance  parameter.  Due  to  this  and  the  previously  stated  conditions,  the 
second  classification  node  will  become  active  and  its  weight  vector  will  be  set  equal  to: 

=  p(1 2  A  +  (1  -  P)wf^  =  [0.24, 0.75, 0.76, 0.25]  (4.4.28) 

Once  again,  the  weight  vector  becomes  the  input  vector.  Recall  that  in  feature  space 
these  weight  vectors  each  define  a  two-dimensional  rectangle.  The  maximum  and 
minimum  (lower  left  and  upper  right)  coordinate  pairs  of  the  vertices  of  these  rectangles 
can  be  calculated  by  simply  taking  the  first  two  components  of  the  weight  vector  and 
complement  coding  the  last  two  elements.  This  results  in  two  points  in  the 
two-dimensional  feature  space: 

(Ml,  v^)  =  [0.9,0.1,0.9,0.11 

and  similarly,  (m2,  V2)  =  [0.24, 0.75, 0.24, 0.75]  (4.4.29) 

Recall  that  the  weight  vectors  can  only  decrease  monotonically,  and  as  they  decrease  the 
corresponding  classification  rectangle  in  feature  space  grows.  When  the  next  tree  pixel  is 
evaluated  by  the  network,  resonance  does  not  occur  with  the  first  classification  node  due 
to  the  vigilance  parameter,  however  resonance  does  occur  with  the  second  node  as  p  = 
0.945  >  0.9.  As  discussed  earlier,  learning  will  now  occur  resulting  in  the  following  new 
weight  vector: 

=  P(1 3  A  wf^)  +  (1  -  P)w^'^  =  [0.24, 0.75, 0.70, 0.20]  (4.4.30) 

and  this  weight  vector  defines  a  classification  rectangle  with  the  following  coordinates: 

(m2,v^)  =  [0.24,0.75,0.3,0.8]  (4.4.31) 

But  before  the  previous  assertion  can  be  made,  we  must  check  that  the  rectangle  does  not 
exceed  the  maximum  allowed  size  given  by: 

\R2 I  <  (1  -  p)M=  [0.3  - 0.24  +  0.8  -  0.75]  <  (1  - 0.9)  *  2  (4.4.32) 
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and  we  find  that  0.1 1  <  0.2,  so  the  rectangle  grows  within  allowable  size  limits.  Note  that 
in  feature  space  (Figure  4.4.6)  the  region  describing  the  pine  class  has  grown  the 
minimum  amount  necessary  to  surround  both  pixels  that  belong  to  this  class. 


water 

(0.9, 0.1) 


blue 


Figure  4.4.6  -  two-dimensional  feature  space 
representation  of  the  pine  and  water  weight  vectors 


The  inter-ART  field,  represented  as  map  field  F®'’  in  Figure  4.4.7,  has  two 
missions.  First,  it  maps  the  classification  from  ART^  to  the  classification  output  of 
ARTb,  and  secondly  it  realizes  the  match  tracking  rule.  The  dynamics  of  the  map  field 
are  illustrated  in  Figure  4.4.7  below. 


Figure  4.4.7  -  Representation  of 
the  dynamics  of  the  inter- ART  field 
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X«b=ybAwf 


We  are  initially  interested  in  what  occurs  when  the  ART^  and  ARTb  modules  are 
active  and  in  agreement.  When  this  occurs  x®'’  (the  output  vector  from  field  F®*’)  becomes; 

(4.4.33) 

which  should  be  interpreted  as  the  fuzzy  AND  of  the  classification  output  from  the  ARTt 
module  and  the  weights  between  ARTa's  Jth  classification  field  node  and  the  map  field. 
Note  that  all  components  of  this  weight  vector,  like  the  others  previously  discussed,  are 
initially  set  equal  to  one  (the  one-to-one  mapping  between  F*’’  and  F2'’  is  always 
accomplished  with  unity  gain).  Fuzzy  ANDing  the  classification  field  weight  vector  with 
the  classification  result  from  ART^  permits  the  network  to  derive  a  mapping  from  the 
input  vectors  that  activate  the  same  node  in  ARTg  to  the  correct  classification  node  in 
ARTf,.  Once  node  J  in  F2®  learns  to  predict  node  K  in  F^**,  one  element  of  the  weight 
vector  between  them  is  set  to  one  for  all  time.  In  our  notation,  this  rule  can  be 
represented  as: 


<=1  (4.4.34) 

It  is  crucial  to  note  that  the  activation  of  different  nodes  in  F2®  may  be  mapped  to  the  same 

output  classification  class.  This  permits  the  mapping  of  "many  to  one".  Understanding 
this  result  is  vital  to  understanding  one  of  the  greatest  strengths  of  this  network.  Even 
though  many  different  grass  pixels  may  activate  different  classification  nodes  in  ART^ 
(and  therefore  must  be  located  in  different  hyper-rectangles  in  feature  space)  they  are 
nevertheless  still  part  of  the  same  classification  class  and  will  be  correctly  mapped  to  it. 
The  "many-to-one  mapping"  is  depicted  in  Figure  4.4.8  on  the  following  page. 

When  there  is  a  mismatch  during  training  between  the  output  of  ARTa  and  the 
correct  classification  of  ARTb,  match  tracking  occurs.  This  is  mathematically  triggered 
when: 


<pa6ly* 


(4.4.35) 
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F2^ 


pab 


F2‘ 


pab 


Figure  4.4.8  -  Graphical  representation  of  the  mapping  of  many  classification 
nodes  to  the  same  output  classification  class 


The  match  tracking  rule  then  increases  ARTa's  vigilance  parameter  Pa  until  the  correct 
node  in  ARTais  activated,  and  this  occurs  when: 

|x“|  =  |l  aw"!  >  pa|l  I  (4.4.36) 

and  this  will  drive  the  output  of  the  field  map  to  be: 

=  |y*Awf  I  >p«6|y*|  (4.4.37) 

In  the  event  that  no  node  can  be  found  that  satisfies  these  equations,  all  of  the  nodes  in  Fj'' 
are  set  to  zero.  In  essence,  by  shutting  down  all  classification  nodes  due  to  a  pixel  that 
does  not  map  to  any  of  the  hyper-rectangles  in  feature  space,  the  neural  network  is 
responding,  "I  don't  know".  This  powerful  result  will  cause  pixels  that  are  not  in  any  of 
the  classes  to  be  mapped  to  the  backgroimd  class. 

Once  the  network  is  suitably  trained,  the  ARTb  module  is  disconnected.  Pixels 
from  the  image  are  sequentially  presented  to  the  preprocessing  fields  of  ART^  and  their 
resulting  classification  is  read  at  the  output  of  the  map  field.  At  classification  time,  the 
output  of  the  map  field  is: 

=  (4.4.38) 

which  is  simply  the  weight  vector  between  F2^  and  F®*’.  Recall  that  this  vector  will  have 

all  of  its  elements  equal  to  zero  except  for  one  whose  value  will  be  one.  This  vector 
simply  contains  the  encoded  classification  class.  When  all  of  the  nodes  in  F2^  are  equal  to 
zero,  the  output  from  the  map  field  is: 
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(4.4.39) 


x"*=0 

which  implies  a  result  of  "I  don't  know"  or  assignment  to  the  background  class. 

The  fuzzy  ARTMAP  architecture  is  uniquely  suited  to  classifying  remotely 
sensed  images.  No  other  neural  networks  combine  the  great  strengths  of  ARTMAP,  such 
as  ART  dynamics,  exception  handling  capability,  and  the  ability  to  effectively  deed  with 
analog  data  in  such  a  stable  and  rapid  learning  environment. 


V 
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4.5  Reporting  the  Results 

As  previously  discussed,  the  simple  LANDS  AT  TM  image  in  Figure  4.5.1  is 
composed  of  four  major  classes,  water,  grass,  deciduous  trees,  and  pine  trees.  Assume 
that  the  image  is  100  pixels  by  100  pixels.  We  will  also  assume  that  15%  of  the  image 
(1,500  pixels)  is  composed  of  deciduous  trees,  25%  is  made  up  of  pine  trees  (2,500),  20% 
is  water  (2,000),  and  the  remaining  40%  (4,000)  is  made  up  of  grass. 


Figure  4.5.1  -  depiction  of 
LANDSATTM  scene 


The  accuracy  of  a  particular  classification  algorithm  often  is  illustrated  through 
the  use  of  a  simple  mathematical  construct  termed  a  confusion  or  error  matrix.  Consider 
the  confusion  matrix  in  Figure  4.5.2,  which  represents  the  results  attained  by  segmenting 
the  artificial  LANDSAT  TM  scene  with  a  particular  classification  algorithm. 
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Figure  4.5.2  -  a  simple  confusion  matrix 
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The  entries  along  the  bottom  of  the  matrix  represent  the  ground  truth  classes  and  the 
entries  along  the  right  edge  represent  the  classification  as  performed  by  the  algorithm. 

The  classification  total  represents  the  sum  across  a  particular  row,  while  the  ground  truth 
total  is  calculated  by  summing  down  each  column.  The  results  in  this  matrix  can  lead  to 
some  important  conclusions.  The  "grass"  column  indicates  that  of  the  4,000  ground  truth 
grass  pixels,  3,885  were  correctly  classified,  90  were  incorrectly  classified  as  pine  trees, 
and  25  were  also  incorrectly  classified  as  deciduous  (leaf)  trees.  Note  that  none  of  the 
grass  pixels  were  incorrectly  classified  as  water.  This  should  not  be  surprising  as  their 
respective  mean  vectors  are  probably  widely  separated,  leading  to  low  classification  error 
and  little  confusion.  This  is  not  true  for  the  pine  and  leaf  columns,  where  considerable 
confusion  (between  20  and  26  percent)  is  evident  between  the  two  tree  types.  This 
confusion  is  not  unexpected  as  the  two  mean  vectors  are  likely  to  be  separated  only 
slightly,  leading  to  some  misclassification. 

The  overall  accuracy  of  a  classification  algorithm  often  is  reported  as  a  simple 
accuracy.  This  metric  is  calculated  by  summing  the  correctly  classified  pixels  along  the 
diagonal  of  the  confusion  matrix  and  dividing  by  the  total  number  of  pixels.  For  the 
preceding  confusion  matrix,  this  results  in  a  simple  accuracy  measurement  of: 

3885+2000+1985+1 103  qo  no/  r 

10000  -oyJ/o  (4.5.1) 

Note  that  this  measure  does  not  account  for  any  off-diagonal  terms  in  the  confusion 
matrix.  Also  note  that  this  figure  of  merit  is  artificially  inflated  due  to  the  highly 
accurate,  although  easily  accomplished,  segmentation  of  the  water  pixels. 

In  an  attempt  to  account  for  varying  class  content  in  an  image,  the  weighted 
accuracy  measurement  has  been  proposed.  It  is  calculated  by  first  dividing  the  number  of 
correctly  classified  pixels  by  the  number  pixels  present  in  the  class.  This  term  is  then 
divided  by  the  number  of  classes  present  in  the  image.  In  the  preceding  case,  this 
classification  accuracy  would  be  calculated  and  reported  as: 
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Also  note  that  this  figure  does  not  account  for  the  off-diagonal  terms  in  the  confusion 
matrix. 

To  combat  the  problems  associated  with  the  simplistic  computation  of 
classification  accuracy  just  described,  a  measure  will  be  introduced  that  accounts  for  the 
off-diagonal  terms,  but  compensates  for  chance  agreement.  This  measurement  is  termed 
the  kappa  coefficient  and  is  denoted  K.  The  development  that  follows  essentially 
summarizes  the  development  in  Rosenfield  and  Fitzpatrick-Lins  (1986)  with  the 
considerations  for  other  coefficients  coming  from  Foody  (1992). 

The  confusion  matrix  in  Figure  4.5.3  is  identical  to  that  in  Figure  4.5.2  with  the 
addition  of  the  simple  row  and  column  marginals: 
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Figure  4.5.3  -  a  simple  confusion  matrix 
with  row  and  column  marginals  added 


The  marginals,  and  ,  are  simply  calculated  by  dividing  the  sum  across  a  row  or  down 
a  column  by  the  total  number  of  data  elements.  With  these  simple  calculations,  the  kappa 
coefficient  can  be  calculated  as  follows: 


(4.5.3) 


K  = 


Pq-Pe 

\-Pe 


where  the  observed  proportion  of  agreement  Pq  is  defined  as  the  proportion  of  the 
correctly  classified  pixels.  Completing  these  calculations  for  the  previously  defined 


example  results  in: 


D  3885+2000+1985+1103  onn't 
—  10000  —.oy/j 


(4.5.4) 


This  is  simply  the  sum  of  the  diagonal  terms  of  the  confusion  matrix  divided  by  the  total 
number  of  data  elements.  The  proportion  of  agreement  due  to  chance,  ,  is  defined  as: 


PE  =  ^^lPr(l)Pc(l)  (4.5.5) 

where  M  is  the  total  number  of  target  classification  classes  (4  in  this  case).  In  this  case: 


Pe  =  Sm  Pr(i)Pcii)  =  0.39  *  0.4  +  0.2  *  0.2  +  0.25  *  0.25  +  0.16  *  0.15 

=  0.2825 


(4.5.6) 


Substitution  yields  a  value  for  the  kappa  coefficient: 


A  _  Pq-Pe  _  .8973-0.2825  _  «  nr/TQ 

^  \-Pe  1-0.2825  0.8569 


(4.5.7) 


This  value  typically  is  multiplied  by  100  and  presented  as  a  percentage.  This  results  in  a 
kappa  coefficient  of  85.69%,  somewhat  lower  than  the  87.5%  weighted  classification 
accuracy  as  previously  calculated  for  the  same  confusion  matrix. 

Foody  (1992)  notes  that  Pe  is  calculated  from  the  diagonal  terms  of  the  confusion 
matrix.  These  terms  denote  actual  agreement,  and  therefore  improperly  inflate  the 
proportion  of  agreement  due  to  chance.  He  suggests  utilizing  the  following 
measurement;  named  Brennan  and  Prediger's  Kappa  after  its  creators: 


7 

kp&p  =  — r  (4.5.8) 

*  M 

where  M  once  again  represents  the  total  number  of  target  classification  classes.  The  1/M 
terms  can  be  justified  by  the  argument  that  the  marginal  terms  in  a  typical  image 
classification  algorithm  are  free  parameters  and  not  fixed  a  priori.  Therefore  the 
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probability  of  chance  agreement  reduces  simply  to  1/M.  In  this  example,  the  metric  has  a 
value  of: 


kfi/bp  =  — ^ 


.8631 


(4.5.9) 


which  is  slightly  larger  than  the  kappa  coefficient.  Intuitively,  the  larger  value  makes 
sense,  as  we  have  removed  some  of  the  actual  agreement  which  previously  contributed  to 
the  probability  of  agreement  due  only  to  chance.  This  study  will  calculate  both  the 
classical  and  Brennan  and  Prediger's  kappa  coefficient  for  each  image  and  each 
classification  methodology  and  compare  their  results  in  actual  use. 

Classification  accuracy  obviously  is  one  of  the  chief  concerns  in  this  study,  but 
other  metrics  were  applied  to  the  various  segmentation  algorithms.  They  also  were 
compared  in  terms  of  execution  time  required  for  image  segmentation.  The  time  function 
of  the  computer  was  queried  prior  to  entering  and  after  completing  the  segmentation 
phase  of  each  algorithm.  The  difference  between  these  two  values  is  published  with  the 
results  of  each  classification  run.  Note  that  the  training  time  for  the  neural  network  will 
be  included  in  this  measure,  while  the  creation  of  bounding  polygons  for  the  nPDF  space 
will  not.  The  creation  time  of  the  nPDF  classification  polygons  requires  direct  user 
intervention  and  considerable  care  to  produce  accurate  results.  If  possible,  the  various 


modules  will  also  be  "profiled"  to  determine  where  most  of  the  computing  time  is  spent. 
The  end  result  of  compiling  eind  presenting  these  various  measurements  will  produce  a 
"snapshot"  of  the  best  conditions  and  related  difficulties  of  the  differing  approaches. 
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5.0  Using  the  Classification  Modules  in  the  AVS  Environment 


The  Advanced  Visualization  System  from  Advanced  Visual  Systems  was  chosen 
to  support  this  study  for  a  number  of  reasons.  Most  importantly,  this  system  provides  an 
extremely  powerful  and  user  extensible  data  visualization  and  image  computing 
environment  for  the  programmer.  A  competent  "C"  programmer  can  expect  to  be  able  to 
write  AVS  modules  in  no  more  than  a  couple  of  weeks.  The  very  rapid  ramp-up  time  can 
be  attributed  to  relieving  the  programmer  of  the  considerable  intricacies  of  the 
X- Windows  graphical  environment  and  simplifying  the  problems  of  data  transfer  between 
the  modules.  As  such,  virtually  all  modules  for  the  environment  are  composed  of  two 
distinct  code  sections.  The  first  defines  the  interfaces  with  other  modules  and  how 
control  parameters  are  passed  to  the  second  code  section.  This  second  section,  the 
compute  function,  is  almost  entirely  composed  of  standard  C  calls,  the  only  exceptions 
being  the  routines  to  access  shared  data  between  the  modules.  The  complete  source  code 
for  each  module  developed  for  this  study  is  included  in  appendix  A  of  this  report. 

Possibly  the  most  attractive  feature  of  the  AVS  environment  is  that  computational 
chains  can  be  readily  represented  as  "networks"  or  "procedures"  of  simple  modules. 
Imaging  operations  that  traditionally  would  take  considerable  time  to  uniquely  code  by 
hand,  either  in  a  low-level  programming  language  or  a  mathematics  package,  can  be 
created  by  merely  connecting  a  few  modules  into  a  network.  In  addition,  many  modules 
for  common  imaging  operations,  such  as  reading,  writing,  and  displaying  standard  image 
formats,  have  already  been  developed  and  tested.  This  study  extended  the  use  of  AVS  at 
RIT  into  image  classification,  but  it  required  only  a  handful  of  specialized  modules.  All 
of  the  inherently  compatible  base  functionality  already  had  been  developed. 

In  the  following  sections  of  this  report,  a  representative  processing  network  for 
each  of  the  classification  operations  will  be  described  along  with  detailed  operating 
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instructions  for  each  of  the  classification  modules.  As  such,  this  chapter  can  serve  as  a 
user's  guide  to  image  processing  with  the  supplied  modules. 
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5.1  Collecting  Training  Data  in  the  AVS  Environment 

Two  separate  modules  and  assoeiated  networks  were  construeted  for  this  study  to 
support  the  collection  of  training  data  for  the  classification  algorithms.  The  method  by 
which  training  data  is  interactively  defined  by  the  user  by  superimposing  polygons  on  the 
image  proved  to  be  a  difficult  project  for  even  an  experienced  software  engineer  at  RIT. 
For  this  reason,  its  development  was  delayed  and  a  method  to  test  the  evolving 
classification  modules  in  a  timely  manner  became  critical.  To  this  end,  the  Fuzzy 
K-Means  module  was  developed  and  implemented.  This  module  serves  a  dual  role. 
Besides  being  able  to  collect  spectrally  pure  training  data,  it  can  also  be  used  for 
unsupervised  image  classification.  Consider  the  AV S  network  in  Figure  5.1.1. 


Figure  5.1.1  -  Depiction  of  Fuzzy  K-Means  network 
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In  operation,  the  network  reads  in  a  user-selected  image,  selects  a  single  band,  and 
displays  the  image.  Pointers  to  the  multispectral  image  data  in  shared  memory  are  then 
passed  to  the  to  Fuzzy  K-Means  module.  The  control  panel  for  the  module  is  depicted  in 
F igure  5.1.2.  The  user  provides  the  algorithm  with  a  number  of  parameters  before 

clustering  can  occur.  First,  the  user  must  enter  the 
number  of  clusters  to  locate  and  select  a  membership 
value.  By  setting  the  membership  to  a  high  value 
(near  1) ,  a  few  pixels  in  very  tight  spectral  groups 
will  be  collected  for  each  cluster.  When  operated  in 
this  manner,  the  module  supports  the  collection  of 
training  data.  If  the  membership  is  instead  set  to  a 
low  value,  the  image  can  be  classified  in  a 
unsupervised  marmer  by  calculating  the  cluster 
centers  and  performing  a  form  of 
minimum-distance-to-the-means  classification  with 
respect  to  the  value  of  the  membership  function.  The 
"Sample  offset"  parameter  permits  the  input  image  to 
be  subsampled.  By  entering  a  value  of  two,  every 
other  pixel  in  every  other  row  of  the  image  will  be  used  for  cluster  computations.  As 
expected,  increasing  this  value  dramatically  reduces  processing  time  as  only  a  fraction  of 
the  pixels  in  the  input  image  need  be  processed.  The  "Maximum  iteration"  parameter 
permits  the  user  to  define  the  maximum  number  of  clustering  iterations.  Similarly,  the 
"Cluster  shift  limit"  parameter  is  used  by  the  algorithm  to  determine  when  the  clustering 
algorithm  has  converged.  The  value  of  four  indicates  that  the  greatest  shift  of  any  cluster 
from  one  iteration  to  the  next  must  be  less  than  or  equal  to  four  to  end  processing.  Once 
the  user  has  supplied  these  parameters,  only  the  "Generate  clusters"  or  "Auto-cluster" 


Figure  5.1.2  -  Depiction  of  the 
control  panel  for  the 
Fuz2y  K-Means  module 
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toggle  switch  need  be  selected.  The  auto-clustering  feature  will  eventually  allow  a 
number  of  images  to  be  processed  successively.  The  module  then  executes  the  compute 
function  code.  It  first  selects  the  appropriate  number  of  pixels  pseudorandomly  from  the 
input  image  to  serve  as  initial  approximations  of  the  cluster  centers.  If  the  "Skip  zero 
vectors"  toggle  has  been  set,  the  module  checks  for  and  skips  any  pixels  with  zero 
magnitude.  This  permits  segmented  images  to  be  rapidly  processed.  It  then  allocates  a 
"membership"  array  for  each  cluster  that  has  the  same  dimensions  as  the  input  image. 

The  membership  value  for  each  pixel  from  the  image  or  subsampled  image  is  then 
calculated  with  respect  to  each  cluster  center  and  then  the  center  approximations  are 
recomputed.  This  process  is  repeated  until  the  algorithm  converges  and  the  display  is 
updated  to  show  the  iteration  number  and  greatest  cluster  shift.  The  clustering  algorithm 
obviously  is  computationally  intensive  and  may  take  a  considerable  amount  of  processing 
time  before  convergence  occurs.  Once  the  conditions  for  convergence  are  met,  the 
elapsed  time  is  reported  and  the  required  number  of  iterations  is  displayed.  At  this  point, 
the  training  data  set  is  collected  and  the  unsupervised  classification  map  is  constructed. 
The  membership  values  with  respect  to  each  cluster  center  for  each  pixel  in  the  input  or 
subsampled  image  are  compared  to  find  the  maximum  value.  If  this  value  is  equal  to  or 
greater  than  the  value  membership  function,  the  pixel  is  assigned  to  the  corresponding 
cluster  and  its  multispectral  pixel  is  added  to  the  training  data  linked  list.  If  the  greatest 
membership  value  is  not  greater  than  or  equal  to  the  value  of  the  membership  parameter, 
the  pixel  is  assigned  to  the  background  class.  The  resulting  training  data  in  the  form  of  a 
linked  list  is  then  transferred  to  the  write  training  module  so  that  it  can  be  optionally 
saved  to  disk.  Similarly,  the  classification  map  can  be  saved  to  disk  in  a  standard  image 
format.  A  colormap  is  constructed  and  applied  to  the  unsupervised  classification  map  so 
that  the  clustering  and  classification  results  can  be  readily  visualized. 
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Figure  5.1.3  -  Depiction  of 
network  to  gather  user 
defined  training  data 


The  network  that  supports  gathering  training 
data  from  the  input  multispectral  image  in  an 
interactive  manner  will  now  be  discussed.  This 
module,  and  the  supporting  modules  that  read  and 
write  the  training  data  structures,  were  written  by 
Stephen  Schultz,  resident  software  engineer  at 
RIT.  The  programming  of  these  modules  required 
an  in-depth  understanding  of  the  X  windows 
environment  that  is  currently  beyond  the  capability 
of  the  author.  Consider  the  network  in  Figure 
5.1.3.  This  network  reads  and  displays  the 
user-selected  image.  References  to  the  input  data 
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Figure  5.1.4  -  Depiction  of 
the  control  panel  for  the 
Build  Training  sets  module 


are  passed  to  the  Build  Training  Sets  module.  The 
control  panel  for  this  module  is  depicted  in  Figure 
5.1.4.  To  define  training-class  polygons,  the  user 
simply  designates  the  polygon  vertices  with  the 
cursor  and  a  mouse  click.  The  module  allows 
multiple  polygons  to  define  a  single  class. 
Additional  polygons  are  added  to  the  current 
training  set  by  selecting  the  "Create  New  Polygon" 
button.  After  the  user  has  defined  a  given  class, 
the  process  is  repeated  for  a  new  class  after 
selecting  the  "Create  New  Training  Set"  button. 

To  correct  any  errors,  the  module  permits  the  user 
to  remove  the  current  vertex,  polygon,  or  entire 
training  class.  A  subsection  of  an  image  with 
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Figure  5.1.5  -  Depiction  of 
image  with  overlaid 
training  class  polygons 


overlaid  training  class  polygons  is  depicted  in  Figure 
5.1.5.  After  all  polygons  for  all  classes  have  been 
defined,  the  user  selects  the  "Gather  Training  Data" 
button.  This  causes  the  module  to  retrieve  the 
designated  multispectral  pixels  fi'om  the  input  image, 
label  them  as  belonging  to  a  given  class,  and  chains 
them  together  in  the  form  of  a  linked  list.  The  linked 
list  is  then  passed  on  to  the  Write  Training  module 
where  it  is  written  to  a  user-specified  disk  file.  The 
training  data  file  can  then  be  readily  processed  by  the 
modules  that  support  the  classification  operations. 
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5.2  Gaussian  Maximum  Likelihood  in  the  AVS  Environment 


Accomplishing  image  classification  with  the  GML  Classify  and  Class  Statistics 
modules  is  a  straight  forward  process  in  the  AVS  environment.  Before  an  image  can  be 
classified,  the  representative  statistics  for  the  training  classes  must  be  calculated.  Figure 
5.2.1  depicts  the  AVS  network  that  is  used  to  create  the  required  statistics  files.  Training 
data  collected  by  either  the  fuzzy  K-means  or  the  user-interactive  module  is  transferred  in 


Figure  5.2.1  -  Depiction  of 


the  class  statistics  network 


the  form  of  a  linked  list  from  the  Read  Training 
module  to  the  Class  Statistics  module.  The  data  is 
then  moved  fi*om  the  linked  list  to  an  internal  data 
structure  to  support  further  operations.  The  control 
panel  that  supports  the  user  interface  to  the  statistics 
module  is  depicted  in  Figure  5.2.2.  Once  the 
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Figure  5.2.2  -  The  class  statistics 
module  control  panel 


training  data  has  been  received  and  an  output 
statistics  filename  has  been  selected,  the  interface  is 
updated.  It  displays  the  number  of  training  sets  or 
classes,  the  total  number  of  training  class  pixels,  and 
the  dimensionality  of  the  training  set.  When  the 
user  selects  either  the  "Generate  statistics  file"  or 
"Auto-generate  statistics  file"  button,  the  module 
begins  the  compute  function's  statistical  operations. 
In  more  somewhat  greater  detail,  the  mean  vector, 
the  variance-covariance  matrix,  and  its  inverse  are 
calculated  for  each  training  class.  Matrix  inversion 
is  accomplished  through  Lower-Upper  (LU) 
decomposition  and  backsubstitution.  When  these 
operations  are  complete,  the  interface  is  updated  to 
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reflect  the  elapsed  processing  time  and  the  resulting  class  statistics  are  written  to  the  user 
specified  disk  file.  A  sample  statistics  file  is  depicted  in  Figure  5.2.3. 


training  class  statistics  file  for  AVS  GML 
6  bands 
2  classes 


These  are  the  mean  vectors: 

101.558304  87.162682  79.549095  108.158653  117.790672  91.485168  3473.000000 
105.620270  99.025017  91.263634  127.766518  130.973694  98.951248  1559.000000 


These  are  the  inverted  matrices: 

0,148350  -0.031817  0.079483  -0.010912  -0.002541  -0.030949 
-0.031817  0.339271  -0.272071  -0.005770  0.050755  0.034068 
0.079483  -0.272071  0.444432  0.002016  -0.055556  -0.025385 
-0.010912  -0.005770  0.002016  0.160764  -0.005757  0.015016 
-0.002541  0.050755  -0.055556  -0.005757  0.211132  -0.045791 
-0.030949  0.034068  -0.025385  0.015016  -0.045791  0.203757 

0.402535  -0.241097  0.073095  0.037794  0.001006  -0.067547 
-0.241097  0.601827  -0.238123  -0.037646  0.051879  0.028189 
0.073095  -0.238123  0.422447  0.031728  -0.047003  0.000208 
0.037794  -0.037647  0.031728  0.184273  -0.029169  0.011340 
0.001006  0.051879  -0.047003  -0.029169  0.233353  -0.056189 
-0.067547  0.028189  0.000208  0.011340  -0.056189  0.237680 

These  are  the  natural  logs  of  the  determinants: 

9.728889  7.587222 

These  are  the  normalized  ve  matrices: 

7.810011  -0.872250  -1.863393  0.423204  0.059587  1.082154 
-0.872250  6.033162  3.718243  0.165424  -0.659772  -0.838456 
-1.863393  3.718243  4.887069  -0.022048  0.320907  -0.222125 
0.423204  0.165424  -0.022048  6.296008  0.039846  -0.421174 
0.059587  -0.659772  0.320907  0.039846  5.272109  1.341222 
1.082154  -0.838456  -0,222125  -0.421174  1,341222  5.517158 

3.484727  1.426702  0.223407  -0,536731  -0.159543  0.808821 
1.426702  2.768975  1.273246  -0,003784  -0.368406  -0.010975 
0.223407  1,273246  3.100528  -0,270882  0.305396  -0.005112 
-0.536731  -0.003784  -0.270882  5,693850  0.592060  -0.283535 
-0.159543  -0.368406  0.305396  0.592060  4.767593  1.096926 
0,808821  -0.010975  -0,005112  -0.283535  1.096926  4.711347 


These  are  the  determinants: 

16795.867188  1972.824951 

Figure  5.2.3  -  Sample  statistics  file 

Once  a  statistics  file  has  been  created,  image  classification  can  be  accomplished  readily. 
Consider  the  network  in  Figure  5.2.4  on  the  following  page  that  is  utilized  to  perform 
GML  classification. 
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Figure  5.2.4  -  GML classification  network 


The  network  reads  the  multispectral  image  and  permits  the  user  to  select  a  single  band  to 
display  the  input  image.  A  reference  to  the  multispectral  image  data  (in  the  form  of  a 
pointer  to  shared  memory)  is  transferred  to  the  GML  Classification  module.  The  control 
panel  for  module  is  depicted  in  Figure  5.2.5.  This  parametric  classifier  requires  a  model 
of  the  training  data  created  by  the  Class  Statistics  module  as  previously  discussed.  Once 
the  user  selects  a  valid  statistics  file,  the  data  from  the  selected  file  is  printed  to  the 
standard  output  device  and  the  interface  is  updated  to  reflect  the  classification  parameters, 
the  number  of  target  classes,  and  the  dimensionality  of  the  data.  At  this  point,  the  user 
need  enter  only  an  appropriate  value  for  the  Chi  squared  (X^)  distance  and  select  either 

the  "Generate  GML  map"  or  "Auto-generate  GML"  button.  The  auto-generation  feature 

2 

allows  the  same  image  to  be  easily  reclassified  by  just  entering  a  different  value  of  X  . 
The  module  then  enters  its  compute  function.  The  "Skip  zero  vectors"  toggle  allows 
segmented  images  to  be  rapidly  processed.  Before  being  processed,  the  magnitude  of 


60 


each  pixel  is  calculated;  if  zero,  the  pixel  is  skipped. 
It  is  important  to  note  that  the  Mahalanobis  distance 
for  each  non-zero  pixel  in  the  input  image  is  then 
calculated  with  respect  to  the  mean  vector  of  each 
target  class  utilizing  its  unique  variance-covariance 
matrix.  As  expected,  if  the  image  size  or  the  number 
of  target  classes  is  increased,  classification  time 
proportionally  increases.  If  the  minimum  distance 

value  to  one  class  is  less  than  that  of  the  user  input 

2 

value  for  X  ,  the  pixel  is  assigned  to  that  class.  The 
value  in  the  classification  map  at  the  position  of  the 
input  pixel  then  is  updated  to  reflect  this  assignment. 
If  the  minimum  distance  exceeds  that  of  the  limiting 
value,  the  pixel  is  assigned  to  the  background  class. 
Once  the  entire  classification  map  is  constructed  in 
the  manner  just  described,  the  interface  is  updated  to  reflect  the  elapsed  computation  time 
in  seconds.  The  network  then  constructs  a  colormap  and  applies  it  to  the  classification 
map  so  that  the  classification  results  can  be  readily  visualized.  The  classification  map  is 
then  converted  to  a  standard  image  format  and  optionally  is  written  to  a  user-defined  file. 


^  GML  Classification  Parameters; 


Figure  5.2.5  -  The  GML 
classification  control  panel 
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5.3  Performing  nPDF  Classification  in  the  AVS  Environment 


The  inherent  graphical  nature  of  the  AVS  environment  is  well  suited  to 
performing  classification  with  the  n-Dimensional  Probability  Density  Function  (nPDF) 
algorithm.  Recall  that  this  approach  requires  that  the  multispectral  training  data  must 
first  be  projected  into  "nPDF"  space  to  construct  a  LookUp  Table  (LUT)  for  image 
classification.  The  layout  in  Figure  5.3.1  depicts  the  AVS  network  that  is  used  to  project 
the  multispectral  training  data  into  the  two-dimensional  nPDF  space. 


nPPtpJT 
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Cod^ert  Image 


Select  Im^e  BegleW  U 


Write  Image 


geometry  vlaWer  U 


Figure  5.3.1  -  Depiction  of  the  network  to  project 
training  data  into  nPDF  space 


This  network  reads  in  a  user-specified  training  data  file  and  passes  it  as  a  linked  list  to  the 
«PDF  I  t/T  module.  At  that  module,  the  training  data  is  transferred  from  the  linked  list 
into  a  more  easily  manipulated  internal  data  structure.  The  control  panel  for  the  nPDF 
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LUT module  is  depicted  in  Figure  5.3.2.  The  user 
then  selects  a  pair  of  hypercube  pairs  by  depressing 
one  of  the  radio  button  options,  and  enters  a  value 
for  the  scale  factor  for  the  resulting  nPDF 
projection.  After  these  two  parameters  have  been 
supplied,  the  user  need  only  select  the  "Generate 
nPDF  plot"  or  "Auto-generate  nPDF  plot"  to  begin 
the  projection  process.  The  auto-generation  feature 
allow  the  same  input  training  set  to  be  repeatedly 
projected  with  a  variety  of  comer  choices  and  scale 
factor  values.  Once  the  module  enters  the  compute 
function,  the  interface  is  updated  to  reflect  the 
dimensionality  and  the  number  of  classes  in  the  training  data  set.  The  algorithm  then 
projects  each  pixel  from  the  training  set  into  nPDF  space  as  defined  by  the  comer  choice 
and  scale  factor  parameters.  When  all  pixels  have  been  processed,  the  interface  is 
updated  to  display  the  elapsed  processing  time  and  the  projection  is  passed  to  the  rest  of 
the  network.  First,  the  nPDF  projection  is  divided  by  itself  to  form  a  binary  image, 
converted  to  a  standard  image  file  format,  and  then  optionally  written  to  a  disk  file.  The 
binary  projection  image  must  be  processed  by  another  network  before  it  can  be  used  as  a 
LUT  for  classification  purposes.  The  projected  image  is  scaled  and  displayed  to  permit 
visual  inspection.  If  no  bundles  of  the  training  data  overlap,  good  classification  results 
can  be  expected.  If  the  bundles  do  overlap,  the  comer  pairs  and  scale  factor  are  varied 
until  the  bimdles  no  longer  intersect,  or  such  intersection  is  minimized.  Continuing  on 
with  the  network,  a  subsection  of  the  displayed  image  can  be  selected.  A  colormap  is 
then  generated  and  applied  to  this  subsection.  The  resulting  nPDF  projection  can  be 
converted  to  a  height  field.  The  resulting  colored  three-dimensional  field  can  be  viewed 


Figure  5.3.2  -  Control  panel  for 
the  nPDF  LUT  module 
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with  the  AVS  geometry  viewer.  In  Figure  5.3.3,  the  nPDF  projection  for  a  five-class 
training  set  is  depicted  as  a  flat  projection  and  as  a  height  field. 


Figure  5.3.3-  nPDF  projection  of  training  data 
depicted  as  a  height  field  and  a  flat  projection. 

As  mentioned  in  section  4.3  of  this  report,  the  nPDF  projection  must  be  processed 
before  it  can  be  used  as  a  LUT  for  classification  purposes.  In  a  manner  similar  to  that  for 
constructing  polygons  around  training  data  of  interest,  a  boundary  is  drawn  around  each 
individual  cluster.  Care  must  be  taken  to  completely  encompass  a  class  and  to  provide 
precise  boundaries  between  clusters,  as  the  accuracy  of  the  resulting  classification 
depends  entirely  on  the  accuracy  of  the  LUT.  It  is  also  important  to  realize  that  the 
complete  spectral  extent  of  a  class  must  be  encircled  by  its  class  polygon  in  nPDF  space. 
One  method  for  accomplishing  this  is  to  attach  a  piece  of  acetate  to  the  computer's 
display  and  sketch  the  position  of  the  training  data  clusters.  Once  this  has  been 
accomplished,  the  nPDF  projection  of  the  entire  image  to  be  classified  is  then  displayed. 
By  utilizing  a  straightedge  and  a  pen,  the  projection  of  the  entire  image  can  be 
segmented  readily  into  classification  regions.  Once  a  classification  region  has  been 
completely  surrounded,  the  region  is  then  filled  with  the  numerical  value  for  that  class. 
Each  individual  region  is  saved  to  a  disk  file  as  an  image.  The  network  in  Figure  5.3.4 
page  is  utilized  to  create  the  individual  classification  regions  and  fill  them  with  the 
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appropriate  class  value.  The  network 
reads  in  an  image  (typically  the  nPDF 
projection  of  the  image  to  be 
classified)  scales  its  values  to 
promote  interpretability,  and  displays 
it.  The  classification  regions  are 
individually  designated  with  the 
Select  Polygon  Region  module. 

These  regions  are  passed  onto  the  Fill 
Polygon  Region  module  where  the 
designated  classification  regions  are 
filled  with  the  appropriate  class 
value.  The  separate  classification 
region  images  are  then  written  to  a 

disk  file  in  a  standard  format. 

Once  the  individual  classification  regions  have  been  designated,  they  must  be 
combined  into  a  single  LUT.  The  network  depicted  in  Figure  5.3.5  is  utilized  for  this 
purpose.  This  AVS  procedure  reads  in  two  separate  classification  region  images  at  a 
time,  scales  each  image,  and  displays  them.  They  are  then  combined  with  logical  OR 
operation  as  provided  by  the  Or  Image  module.  The  resulting  image  is  written  to  a  disk 
file.  In  operation,  the  user  must  read  the  class-one  and  class-two  image  and  combine 
them  to  form  the  intermediate  LUT  for  classes  one  and  two.  This  process  is  repeated 
until  all  of  the  individual  class  images  have  been  combined  into  a  single  LUT. 


Figure  5.3.4-  AVS  network  to  create  individual 
classification  regions  in  nPDF  space. 
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Figure  5.3.5-  A  VS  network  to  combine  the  individual 
classification  region  images  into  a  LUT. 


The  images  in  Figure  5.3.6  highlight  the  process  from  nPDF  projection  to 
classification  LUT. 


Figure  5.3.6-  nPDF  projection  of  an  image,  nPDF  projection 
of  training  data  from  image,  and  resulting  classification  LUT 


Once  the  LUT  has  been  constructed,  the  image  classified  easily.  The  AVS 
algorithm  for  nPDF  classification  is  depicted  in  Figure  5.3.7.  This  network  reads  in  the 
LUT,  displays  it,  and  converts  it  to  the  format  expected  by  the  classifier.  Similarly,  the 
input  image  is  read,  a  single  band  is  selected,  and  the  image  is  displayed.  The  LUT  data 
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Figure  5.3.7-  Depiction  of  network  to 
accomplish  nPDF  classification 


and  the  multispectral  image  data  are  passed  to  the  nPDF  Classification  module.  The 
control  panel  constituting  the  user  interface  to  this  module  is  depicted  in  Figure  5.3.8.  To 
begin  classification,  the  user  needs  to  select  the  same  corners  of  the  hypercube  as  were 
used  to  project  the  data.  At  present,  AVS  does  not  permit  this  information  to  be 
transferred  in  any  other  manner.  The  simplest  workaround  is  to  include  the  corner  choice 
in  the  name  of  the  LUT.  Once  the  user  has  supplied  the  comer-choice  selection,  the 
"Classify  Image"  or  "Auto-classify  Image"  must  be  selected  before  classification  can 
begin.  The  auto-classification  feature  eventually  will  allow  multiple  images  to  be 
successively  processed  by  the  module.  Once  classification  has  begun,  the  interface  is 
updated  to  reflect  the  scale  of  the  LUT,  which  is  either  its  height  or  width  dimension. 

The  "Skip  zero  vectors"  toggle  allows  for  the  rapid  processing  of  images  with  many  zero 
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magnitude  vectors.  Each  non-zero  pixel  from  the 
input  image  is  projected  into  nPDF  space  as  defined 
by  the  combination  of  the  corner  choice  and  the  scale 
factor.  To  achieve  classification,  the  coordinates  of 
each  input  pixel  in  nPDF  space  are  used  to  reference 
a  location  in  the  LUT.  If  the  pixel  projects  into  one 
of  the  classification  regions,  the  numerical  value  of 
that  region  is  retrieved.  If  instead  the  pixel  projects 
onto  an  undesignated  region,  the  pixel  is  assigned  to 
the  background  class.  Note  that  increasing  the 
number  of  classes  has  no  impact  on  classification 
time,  but  the  LUT  will  be  more  difficult  and  time  consuming  to  generate.  Once  all  of  the 
input  image's  pixels  have  been  classified,  the  elapsed  computational  time  is  displayed. 

The  network  completes  operation  by  building  a  colormap,  applying  it  to  the  classification 
map,  and  displaying  the  results  to  aid  visual  inspection.  The  classification  map  is  then 
converted  to  a  format  to  support  standard  image  formats  and  is  may  be  written  to  a  disk 
file. 

Any  classification  algorithm  can  also  be  utilized  for  image  segmentation.  This 
operation  screens  out  all  pixels  that  do  not  belong  to  a  desired  class  and  forwards  all 
multispectral  pixels  that  do  compromise  a  selected  class.  Instead  of  creating  a 
classification  map,  a  new  multispectral  image  is  created  comprised  solely  of  the  desired 
class  or  classes. 

Image  segmentation  via  the  nPDF  algorithm  will  be  used  as  the  first  step  in  hybrid 
image  classifieation.  In  this  ease,  the  term  hybrid  implies  that  a  nPDF  segmented  image 
will  be  passed  to  either  the  GML  or  fuzzy  ARTMAP  algorithm  for  further  processing.  In 
this  manner,  the  strengths  of  either  classification  algorithm  can  be  tested  on  tightly 
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Figure  5.3.8  -  Control  panel  for 
the  nPDF  Classification  module 
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grouped  spectral  elasses.  The  network  in  Figure  5.3.9  supports  the  first  phase  in  hybrid 
image  classification  by  nPDF  segmentation. 
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Figure  5.3.9  -  AVS  network  to  accomplish  nPDF  image  segmentation 


This  AVS  procedure  functions  identically  to  that  for  nPDF  classification  with  the 
exception  that  it  produces  a  multispectral  image  of  the  class  or  classes  defined  in  the 
LUT.  The  resulting  image  is  then  written  to  disk  in  a  standard  format. 
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5.4  Accomplishing  Fuzzv  ARTMAP  Classification  in  the  AVS  Environment 

Accomplishing  image  classification  with  the  Make  ARTMAP  and  Fuzzy  ARTMAP 
modules  is  a  very  similar  to  the  process  outlined  for  the  GML  module.  Before  an  input 
image  can  be  classified,  a  fuzzy  ARTMAP  neural  network  and  associated  parameter  file 
must  be  created  from  the  training  data.  The  parameter  file  conveys  information  about  the 
constructed  network  such  as  the  dimensionality  of  the  input  space,  the  number  of  ART^ 
classification  nodes,  and  the  number  of  target  classes  (or  ART,,  classification  nodes).  A 
complete  commented  parameter  file  is  depicted  in  Figure  5.4.1 . 


1  #defme  randomlnit  -  seed  for  random  number  generator  "  1 "  =  read  time  clock 

1  #define  orderedSet  "1”  =  read  in  the  patterns  in  the  same  order  as  original  file 

0  #defme  employ_weighting  "  1 "  =  input  features  are  differentially  weighted 

0  #defme  onjine  - "  1"  =  system  goes  through  training  set  once 

1  #define  num_voters 

1  #defme  num_runs  -  number  of  complete  training  runs 

10  #defme  max_iterations  -  maximum  training  iterations 

1  #defme  complement "  1 "  =  complement  is  presented 

2  #define  ajength  -  the  dimensionality  of  the  input  space 

2  #define  b_length  -  the  dimensionality  of  the  output  space 

2  #define  num_data_category  -  number  of  classes 

700  #define  train_pats  -  the  number  of  training  patterns 

0  #define  predict_pats  -  the  number  of  predict  patterns 

300  #defme  test_pats  -  the  number  of  test  patterns 

0  ^define  on_line_recast  -  method  of  recasting  to  handles  inconsistent  cases 

1 .0  #defme  dninit  -  the  initial  top  down  weights  of  both  ART  modules 

0.001  #defme  Alpha  -  parameter  used  in  choiceOrderfunction  of  ARTl 

0.001  #defme  epsilon  lA-match  ->  rho  goes  to  current  match  +  epsilon. 

1.0  #defme  rate  -  under  learning,  wij  ->  (1  -  rate)  *  wij  +  rate  *  fastjearn_limit 

0.0  #defme  z_bar  -  in  artl  .c,  if  top  down  weight  drops  below  zbar,  both  weights  ->  0 

0.0  #define  min_arho  -  minimum  ART-A  rho  value 

1 .0  #define  brho  -  minimum  ART-B  rho  value 

0.0  #define  noise_rate  at  which  recent  success  or  failure  change  confidence  of  nodes 

0.0  #defme  noise_tolerance  below  which  an  ART-A  node  will  be  destroyed 

0  #defme  trace  -  if  >  0,  then  progresses  of  program  will  be  reported 

0  # define  trace_weight  -  allows  features  to  be  weighted  separately 

1  #define  display_confusion_matrix  -  if  1,  confusion  matrix  is  displayed 

0  #define  incorporate_rule 

0  #define  extract_rules 

0  #define  quantize_rules 

0  ^define  quantization_step 

0  #define  individual  match_tracking 

I  #define  num_F2_winners  -  number  of  winning  classification  nodes 

0.0  #define  threshold 


Figure  5.4.1  -  Example  ARTMAP  parameter  file 
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In  general,  a  casual  user  need  not  be  concerned  with  many  individual  parameters  in  the 
file  as  they  are  either  defaults,  have  been  optimized  to  process  multispectral  images,  or 
are  automatically  updated  by  this  module.  The  neural  network  file  that  is  created  by  this 
module  contains  the  weight  vectors  for  each  node  in  both  ART  modules,  and  the  values 
stored  in  the  inter-ART  field  which  maps  an  ART^  classification  node  to  an  ART,, 
classification  node  representing  the  output  class.  The  AVS  procedure  pictured  in  Figure 
5.4.2  depicts  the  AVS  network  that  is  used  to  create  the  required  neural  network  files. 

As  with  the  other  classifiers,  training  data  collected 
either  by  the  fuzzy  K-means  or  the  user  interactive 
module  is  transferred  in  the  form  of  a  linked  list  from  the 
Read  Training  module  to  the  Make  fuzzy  ARTMAP 
module.  The  data  then  is  moved  from  the  linked  list  to 
an  internal  data  structure  to  support  further  processing. 
The  control  panel  that  supports  the  user  interface  to  the 
Make  fuzzy  ARTMAP  module  is  depicted  in  Figure  5.4.3. 
Once  the  training  data  has  been  received  and  a  filenames  for  both  the  parameter  and 
network  files  have  been  selected,  the  interface  is  updated.  It  displays  the  number  of 
training  sets  or  classes,  the  total  number  of  training  class  pixels,  and  the  dimensionality 
of  the  training  set.  At  this  point,  the  user  must  select  a  value  of  the  vigilance  parameter 
(p)  for  the  ARTa  module.  Recall  that  this  determines  the  size  of  the  region  in  feature 
space  associated  with  a  given  node.  Increasing  p  will  decrease  the  maximum  allowable 
size  of  the  hyper-rectangle  in  feature  space  resulting  in  finely  granulized  classification 
and  the  creation  of  a  large  number  of  nodes.  As  expected,  if  the  user  selects  a  smaller 
value  for  the  vigilance  parameter,  the  classification  regions  in  feature  space  grow  and 
fewer  nodes  need  to  be  created.  The  "Recast  inconsistent  cases"  button  permits  the 
algorithm  to  check  for  inconsistencies  in  the  training  data.  For  example,  if  there  were 


Figure  5  A. 2  -  Depiction  of 
the  network  to  construct  the 
parameter  and  network  files 
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some  grass  pixels  in  the  tree  class  and  a  separate  grass  class,  this  preprocessing  algorithm 


would  attempt  to  relabel  the  incorrect  pixels.  At  this  point,  the  user  need  only  select  a 
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Figure  5.4.3  -  Depiction  of 
the  control  panel  for  the 
Make  fuzzy  ARTMAP  module 


value  for  p  and  either  the  "Generate  network"  or 
"Auto-generate  network"  button  to  create  the 
fuzzy  ARTMAP  neural  network.  The 
auto-generation  feature  permits  the  network  to  be 
easily  recreated  for  different  values  of  the 
vigilance  parameter.  The  module  then  enters  its 
compute  function.  As  each  of  the  training  class 
pixels  is  operated  upon,  the  region  into  which  it 
fits  best  with  respect  to  the  vigilance  parameter 
must  be  determined.  If  found  to  fall  within  a 
region  eorresponding  to  a  classification  node,  no 
further  action  need  occur.  If  near  to  a 
classification  region,  and  the  degree  of  "nearness" 
is  controlled  by  p,  the  weight  vector  of  the  node 
is  updated  and  learning  has  occurred.  If  the  pixel 
does  not  fall  in  or  near  any  classification  region,  a 
new  node  is  created.  Once  all  training  class 
pixels  have  been  processed  to  1 00%  recognition, 
the  interface  is  updated  to  reflect  the  number  of 
classification  nodes,  the  number  of  learning 
iterations,  and  the  elapsed  computation  time.  The 
network  and  parameter  files  are  then  written  to 
disk. 
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Figure  5.4.4  -  Depiction  of  the  network  to  perform  fuzzy 
ARTMAP  image  classification  in  the  AVS  environment 


At  this  point,  image  classification  with  the  Fuzzy  ARTMAP  module  can  be 
readily  accomplished.  Consider  the  network  depicted  in  Figure  5.4.4. 


The  AVS  procedure  reads  in  a  multispectral  image  and  permits  the  user  to  select  a  single 
band  to  display  the  input  image.  A  reference  to  the  multispectral  image  data  (in  the  form 
of  a  pointer  to  shared  memory)  is  transferred  to  the  Fuzzy  ARTMAP  module.  The  control 
panel  supporting  the  user  interface  to  this  module  is  depicted  in  Figure  5.4.5.  To  perform 
image  classification,  this  non-parametric  classifier  requires  both  the  parameter  and 
network  file  created  by  the  Make  ARTMAP  module  as  previously  discussed.  Once  the 
user  selects  a  valid  parameter  and  network  file,  the  data  from  both  the  selected  files  is 
read  from  disk  files  and  displayed  on  the  standard  output  device.  The  module's  interface 
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Figure  5.4.5  -  Depiction  of 
the  control  panel  for  the 
fuzzy  ARTMAP  module 


also  is  updated  to  reflect  important  classification 
parameters  to  include  the  value  of  the  vigilance 
parameter,  the  number  of  target  classes,  and  the 
number  of  classification  nodes.  At  this  point,  the 
user  need  only  select  either  the  "Generate 
classification  map"  or  "Auto-generate  classification 
map"  button  to  continue.  The  auto-generation 
feature  allows  the  same  image  to  be  re-classified  by 
just  selecting  different  network  and  related 
parameter  files.  The  module  then  enters  its 
compute  function.  The  "Skip  zero  vectors"  toggle 
allows  rapid  processing  of  segmented  images.  It  is 
important  to  realize  that  each  non-zero  pixel  in  the 
input  image  must  be  compared  to  each 
classification  node's  weight  vector.  If  the  pixel 
falls  within  the  classification  region  of  a  node,  it  is 
assigned  to  that  class.  If  it  does  not  fall  within  any 
classification  region,  it  is  assigned  to  the 
background  class.  The  value  in  the  classification 
map  at  the  same  position  as  each  input  pixel  is  then 
updated  to  reflect  this  assignment.  Once  the  entire 
classification  map  is  constructed  in  the  manner  just 


described,  the  interface  is  updated  to  reflect  the  elapsed  computation  time  in  seconds. 

The  network  then  constructs  a  colormap  and  applies  it  to  the  classification  map  so  that  the 
classification  results  can  be  readily  visualized.  The  data  comprising  the  classification 
map  is  then  converted  to  support  standard  image  formats  and  is  optionally  written  to  a 
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user  defined  file.  As  expected,  if  the  image  size  or  the  value  of  the  vigilance  parameter  is 
sufficiently  large  to  force  the  creation  of  many  classification  nodes,  classification  time 
correspondingly  increases. 
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5.5  Using  the  Confusion  Matrix  module  in  AVS 

Once  the  images  have  been  classified  by  the  various  algorithms,  the  classification 
accuracy  needs  to  be  assessed.  This  is  a  simple  process  in  the  AVS  environment  after  a 
truth  image  has  been  constructed.  A  truth  image  is  a  classification  map  constructed  by 
hand  or  by  a  trusted  algorithm  that  reflects  ground  truth.  Consider  the  AVS  network  in 
Figure  5.5.1. 


Figure  5.5.1  -  Depiction  of  the  network  to  assess  classification  accuracy 

This  network  compares  a  classification  map  constructed  by  one  of  the  classification 
modules  and  a  corresponding  truth  image.  A  colormap  is  constructed  and  applied  to  both 
incoming  images,  and  the  resulting  colored  classification  maps  are  displayed  to  aid  visual 
inspection.  The  classification  map  and  truth  data  are  then  converted  to  a  format 
compatible  with  the  Confusion  Matrix  module.  This  module  checks  to  ensure  that  the 
two  data  sets  are  comparable.  In  specific,  it  checks  that  the  image  sizes  are  equal  and  that 
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the  same  number  of  classes  are  present  in  each  image.  Note  that  it  is  critical  that  the  truth 
image  be  cormected  to  the  leftmost  port  and  that  the  classification  map  being  evaluated  be 
connected  to  the  rightmost  port  of  the  module.  To  aid  in  correct  usage,  the  control  panel 
for  the  Confusion  Matrix  module  is  labeled  to  minimize  errors.  The  user  interface  for  the 

module  is  depicted  in  Figure  5.5.2.  Once  the  module 
has  received  the  reference  to  the  input  data  and  a 
filename  for  the  confusion  matrix,  the  user  need  only 
select  the  "Calculate  confusion  matrix"  or 
"Auto-generate  confusion  matrix"  to  begin  operation. 
The  auto-generation  feature  permits  differing 
classification  maps  from  the  various  modules  to  be 
rapidly  compared  to  a  given  truth  image.  When  the 
module  enters  the  compute  function  the  confusion 
matrix  is  initially  formed.  Each  pixel  in  both  the 
classification  map  and  truth  image  are  compared  in 
pairs.  The  value  of  the  pixel  in  the  truth  image  is  used 

Figure  5.5.2  -  Depiction  of  to  index  the  column  of  the  confusion  matrix,  while  the 
the  control  panel  for  the  , 

Confusion  Matrix  module  classification  map  is  used  to 

index  the  row.  If  the  values  are  the  same,  implying 

agreement  between  classification  and  truth,  an  on-diagonal  term  is  incremented.  If  the 
two  values  differ,  the  corresponding  off-diagonal  term  is  incremented.  Once  the 
confusion  matrix  has  been  completely  determined,  the  simple  accuracy,  the  weighted 
accuracy,  the  kappa  coefficient,  and  Brennan  and  Prediger's  kappa  are  calculated  and  the 
interface  is  updated.  The  confusion  matrix  along  with  the  four  figures  of  classification 
accuracy  are  written  to  the  disk  file.  A  sample  output  file  follows  in  Figure  5.5.3. 
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This  is  the  confusion  matrix: 


262050 

642 

332 

1 

64 

76 

91922 

4 

81 

0 

173 

861 

32463 

35 

121 

180 

1430 

252 

16564 

1292 

170 

0 

0 

0 

9363 

The  simple  accuracy  =  0.986097 

The  weighted  accuracy  =  0.961072 

The  kappa  coefficient  =  0.974530 

Brennan  and  Prediger's  kappa  coefficient  =  0.982621 

Figure  5.5.3-  Sample  output  from 
the  Confusion  Matrix  module 


78 


6.0  Image  Classification  Comparison 

The  color  plates  in  Figures  6.0.1  through  6.0.4  are  the  test  images  for  the  various 
classification  algorithms.  The  images  include  a  range  of  class  types  from  rural 
agricultural  settings  to  densely  populated  urban  scenes.  All  6-band  multispectral  images. 
Three  are  from  the  M7  airborne  sensor  and  the  remaining  image  (Figure  6.0.3)  is  from  the 
LANDSAT  satellite.  Table  6.0.1  below  reports  the  important  statistics  of  each  scene. 


Table  6.0.1  -  Test  image  statistics 


image  name 

size  (H  X  V) 

sensor 

bands 

city.  Ian 

704  X  594 

M7 

3,6,8,10,12  &  13 

landcover.lan 

676x701 

M7 

3,6,8,10,12  &  13 

roch84.1an 

512x512 

LANDSAT 

1  -5&7 

seashore.  Ian 

500  X  500 

M7 

3,6,8,10,12  &  13 

Three  major  classification  tasks  were  undertaken  to  compare  the  classification 
results  derived  from  the  varying  approaches.  In  the  following  sections,  the  restrictions 
imposed  upon  and  the  desired  goals  of  each  task  will  be  described.  The  procedure  for 
how  the  images  were  classified,  along  with  the  elapsed  time  and  accuracy  measurements, 
will  be  presented  for  each  algorithm  accompanied  by  classification  maps  for  visual 
inspection. 
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Figure  6.0.1 
landcover.lan 


6.1  Task  1:  Discussion  and  Results 


The  first  task  undertaken  in  this  study  involved  classifying  approximately  95%  of 
each  image  using  training  data  collected  from  the  fuzzy  K-means  algorithm.  The  goal  of 
this  task  is  to  highlight  the  ability  of  each  algorithm  to  classify  broad  target  categories 
given  trusted  data  for  each  class. 

The  cluster  shift  limit  parameter  of  the  fuzzy  K-means  module  was  set  to  four  for 
all  operations.  This  implies  that  all  cluster  centers  must  have  moved  less  than  a  total  of 
four  pixels  in  the  six-dimensional  feature  space  before  convergence  can  occur.  The 
unsupervised  classifier  was  set  first  to  a  large  membership  value,  typically  0.9  or  0.85,  to 
collect  the  trusted  training  data  for  each  of  the  classifiers.  The  high  membership  value 
used  to  construct  the  training  data  was  manipulated  to  ensure  that  there  are  at  least  60T) 
pixels  per  class,  where  D  is  the  number  of  bands,  to  ensure  valid  statistical  calculations. 
The  clustering  operation  was  repeated  with  the  classifier  reset  to  a  lower  membership 
value,  typically  0.3  to  0.4,  to  form  a  truth  image.  The  lower  membership  value  was 
manipulated  to  ensure  that  approximately  95%  of  the  image  was  classified.  The  value  for 
the  number  of  cluster  centers  was  estimated  from  visual  inspection.  Recall  that  the 
cluster  centers  are  initially  selected  in  a  pseudorandom  manner.  This  ensures  that  the 
same  cluster  centers  will  be  found  regardless  of  membership  value.  Utilizing  the  fuzzy 
K-means  clustering  algorithm  with  a  low  membership  value  provides  an  easy  method  to 
construct  truth  images  for  comparison  purposes.  The  use  of  the  algorithm  to  produce 
truth  images  can  be  justified  by  realizing  that  this  approach  produces  cluster  centers,  and 
the  ensuing  classification  is  based  entirely  upon  naturally  oecurring  "clumps"  of  pixels  in 
feature  space.  The  resulting  truth  images  also  support  visual  intuition  of  class 
membership  and  extent.  The  results  of  the  clustering  operation  are  summarized  in  Table 
6.1.1. 
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Table  6.1.1  -  Fuzzy  K-means  clustering  statistics 


image  name 

number  of  cluster 

centers 

membership 

values 

number  of 
iterations 

elapsed  time  (sec) 

city.lan 

6 

0.9  &  0.3 

17 

2,442 

landcover.lan 

6 

0.9  &  0.3 

7 

1,362 

roch84.1an 

5 

0.85  &  0.3 

6 

635 

seashore.lan 

5 

0.85  &  0.4 

5 

490 

As  expected,  locating  clusters  of  pixels  within  a  multispectral  image  is  computationally 
intensive.  The  wide  distribution  of  both  elapsed  time,  and  the  related  value  for  the 
number  of  iterations  to  reach  convergence,  is  a  function  of  image  size,  complexity  or 
content,  and  desired  number  of  cluster  centers.  The  reported  elapsed  time  is  the  result  of 
averaging  numerous  clustering  operations,  but  very  low  variation  was  observed. 

After  constructing  both  training  data  sets  and  truth  images,  image  classification 
was  readily  accomplished.  For  GML  classification,  the  training  data  from  the  clustering 
algorithm  must  first  be  operated  upon  to  construct  the  parametric  model  of  the  data  which 
is  stored  in  a  statistics  file.  The  detailed  statistics  files  for  each  training  set  can  be  found 
in  appendix  B.  Table  6.1.2  summarizes  the  number  of  classes  in  each  training  set,  the 
number  of  points  in  each  training  set,  and  the  elapsed  time  to  create  the  statistics  file  for 
each  training  set.  Note  that  required  time  to  create  the  statistics  file  depends  upon  both 
the  number  of  classes  and  the  number  of  points  in  each  training  set. 


Table  6.1.2-  Summary  of  GML  statistics  file  parameters 


image  name 

number  of  classes 

number  of  points  in 
training  set  (sec) 

elapsed  time  to  create 
statistics  file  (sec) 

city.lan 

6 

150,874 

11 

landcover.la79n 

6  ' 

44,791 

3 

roch84.1an 

5 

27,177 

2 

seashore.lan 

5 

124,851 

7 
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With  a  mathematical  model  that  describes  the  distribution  of  the  training  data 
constructed,  image  classification  with  GML  is  a  straightforward  process.  The  value  of 
chi  squared  (x^)  was  manipulated  until  the  target  classification  percentage  of  95%  was 
achieved.  Note  that  the  large  values  of  the  distance  can  be  explained  by  realizing  that 
the  training  data  was  collected  at  a  high  membership  value.  As  such,  it  is  tightly 
clustered  in  feature  space.  The  large  value  of  this  distance  parameter  permits  the 
resulting  hyperellipsoids  to  expand  to  encompass  the  size  of  all  classes.  The  reported 
classification  accuracy  is  the  simple  accuracy,  and  the  truth  image  for  comparison 
purposes  was  created  with  the  fuzzy  K-means  algorithm.  Complete  confusion  matrices 
for  each  of  the  images  are  available  for  detailed  inspection  in  appendix  B.  The  GML 
classification  results  are  summarized  in  Table  6.1.3. 


Table  6.1.3  -  Summary  of  GML  classification  statistics 


image  name 

value  of 

elapsed  classification 
time  (sec) 

percentage  of  image 
classified 

classification 

accuracy 

city.  Ian 

256 

60 

94.86 

88 

landcover.lan 

350 

61 

95.1 

85 

roch84.1an 

160 

29 

95.53 

86 

seashore.  Ian 

145 

27 

95.9 

90 

The  images  were  classified  next  with  the  nPDF  algorithm  utilizing  the  same 
training  data.  The  training  data  was  first  projected  in  varying  nPDF  spaces  until  no 
clusters  overlapped,  or  until  any  such  overlap  was  minimized.  Recall  that  nPDF  space  is 
determined  by  both  a  hypercube  corner  pair  and  scale  factor  selection.  The  original 
image  was  then  projected  utilizing  the  same  parameters  so  that  the  extent  of  each  class  in 
the  projected  space  could  be  determined.  This  information  was  then  used  to  build  the 
classification  LUT.  This  process  is  depicted  in  Figure  6.1.1  through  6.1.4. 
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nPDF  projection  projection 

r.  u  •  of  City  training  data 

of  city  image  ^ 


LUT  created 
from  training  data 


Figure  6.1.1  -  nPDF  development  for  the  city. Ian  image 
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nPDF  projection  of  nPDF  projection  of 

landcover  image  landcover  training  data 


LUT  created 
from  training  data 

Figure  6.1.2  -  nPDF  development  for  the  landcover.lan  image 
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nPDF  projection 
of  rochester  image 


nPDF  projection  of 
rochester  training  data 


LUT  created 
from  training  data 


Figure  6.1.3-  nPDF  development  for  the  roch84.1an  image 
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nPDF  projection  of 
seashore  image 


nPDF  projection  of 
seashore  training  data 


LUT  created 
from  training  data 


Figure  6.1.3  -  nPDF  development  for  the  seashore.lan  image 


The  important  nPDF  LUT  creation  statistics  are  summarized  in  Table  6.1.4. 


Table  6.1.4  -  Summary  of  nPDF  LUT  statistics 


image  name 

hypercube 

corners 

scale  factor 

projection  time  for 
training  data  (sec) 

projection  time  for 
image  (sec) 

city.  Ian 

1  &4 

512 

15 

51 

landcover.lan 

1  &4 

512 

4 

59 

roch84.1an 

1  &2 

1,024 

6 

35 

seashore.  Ian 

1  &3 

512 

14 

32 

Note  that  the  elapsed  time  is  determined  by  the  number  of  pixels  in  the  image  or  training 
set.  The  number  of  data  classes  has  no  effect  on  the  either  the  projection  or  classification 
time,  but  it  does  make  constructing  the  LUT  more  difficult  and  time  consuming.  In 
addition,  it  is  important  to  note  that  the  only  LANDSAT  image  in  the  test  group  required 
the  largest  scale  factor,  and  (as  we  shall  see)  has  the  lowest  classification  accuracy.  The 
reason  is  the  fixed  gain  of  the  LANDSAT  TM  sensor,  which  does  not  effectively  utilize 
the  available  dynamic  range  of  digital  count  values,  while  the  exposure  control  of  the  M7 
sensor  does  so.  LANDSAT  TM  images  will  therefore  always  have  lower  dynamic  ranges 
than  comparable  M7  images.  In  nPDF  projection  space,  this  will  manifest  itself  as  more 
densely  packed  class  clusters.  In  turn,  this  forces  larger  scale  factor  values  to  drive  the 
class  clusters  apart.  Since  the  class  clusters  are  closer  together,  any  errors  in  the  LUT  that 
designates  boundaries  between  the  classes  will  lead  to  larger  classification  error  and 
lower  classification  accuracy.  The  images  were  then  classified  with  the  LUTs  and 
associated  parameters  just  described.  Reported  classification  accuracies  reflect  the  simple 
accuracy  metric  with  respect  to  the  fuzzy  K-means  derived  truth  image.  Table  6.1.5 
summarizes  the  important  classification  parameters. 
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Table  6.1.5  -  Summary  of  nPDF  classification  statistics 


image  name 

elapseii  classification 
time  (sec) 

percentage  of  image 
classified 

classification 

accuracy 

city.lan 

53 

95 

88 

landcover.lan 

56 

93 

78 

roch84.1an 

34 

91 

70 

seashore. Ian 

31 

93 

86 

Note  that  it  was  not  possible  to  achieve  the  target  image  classification  percentage  in  all 
cases  with  this  classification  approach.  This  is  due  to  the  fact  that  it  is  very  difficult  to 
construct  LUTs  with  boundaries  that  perfectly  adjoin  without  overlap.  Any  spacing 
between  class  polygon  regions  results  in  pixels  that  should  have  been  assigned  to  a  class 
being  improperly  relegated  to  the  background.  Relatively  low  classification  accuracies 
can  be  attributed  to  the  inherent  data  dimensionality  reduction  of  the  nPDF  projection 
operation.  In  moving  from  six-  to  two-dimensional  space,  information  is  irretrievably 
lost  which  leads  to  lower  classification  accuracies. 

The  images  were  then  classified  with  the  fuzzy  ARTMAP  neural  network 
utilizing  the  same  training  data  as  the  previous  classification  methods.  As  with  the  GML 
approach,  the  neural  network  must  construct  a  mathematical  model  of  the  data.  Note  that 
the  elapsed  time  to  create  the  neural  network  depends  upon  the  value  of  the  vigilance 
parameter  p,  the  number  of  training  points  and  their  variance,  the  number  of  training 
classes,  and  the  resulting  number  of  classification  nodes  created.  This  statement  may  not 
be  readily  evident,  but  it  is  easily  explained.  As  the  algorithm  is  creating  the  artificial 
neural  network,  it  must  evaluate  the  membership  of  each  training  exemplar  in  the  training 
set  with  respect  to  each  of  the  classification  nodes.  As  the  number  of  classification  nodes 
created  increases  (recall  that  this  number  is  driven  by  the  variance  of  the  training  set  and 
the  value  of  p),  more  and  more  sets  of  calculations  are  needed  for  each  training  exemplar. 
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The  value  of  p  was  adjusted  to  achieve  the  target  classification  percentage  of  95%. 
Increasing  the  vigilance  parameter  would  have  produced  a  smaller  hyper-rectangle  in 
feature  space  leading  to  more  accurate  classification,  but  this  would  have  occurred  at  the 
expense  of  classifying  less  of  the  input  image.  The  fuzzy  ARTMAP  network  creation 
statistics  are  summarized  in  Table  6.1.6. 


Table  6.1.6  -  Summary  of  fuzzy  ARTMAP  network  statistics 


image  name 

p  for  ART„ 

number  of  ART, 
nodes 

number  of  learning 
iterations 

elapsed  time  to  create 
network  (sec) 

city.  Ian 

0.88 

6 

2 

51 

landcover.lan 

0.91 

6 

2 

16 

roch84.1an 

0.91 

5 

2 

8 

seashore.lan 

0.9 

12 

3 

72 

Once  each  neural  network  has  been  constructed,  the  images  were  classified.  All  values  in 
the  associated  parameter  files  were  identical  with  the  exception  of  the  size  of  the  training 
set,  the  number  of  classification  nodes,  the  number  of  target  classes,  and  the  value  of  the 
minimum  vigilance  parameter.  Resulting  classification  time  is  affected  by  the  image  size 
and  the  number  of  classification  nodes  in  the  neural  network.  This  is  most  evident  in  the 
seashore.lan  image  test  case.  Though  the  image  is  the  smallest  test  image,  the 
combination  of  the  variance  of  the  training  classes  and  value  of  the  vigilance  parameter 
lead  to  the  creation  of  multiple  classification  nodes  for  each  class.  This  leads  to  the 
largest  elapsed  time  for  network  creation  and  image  classification.  As  the  image  is 
classified,  each  pixel  must  have  its  membership  evaluated  for  each  classification  node. 

As  expected,  this  can  greatly  increase  classification  time.  Table  6.1.7  summarizes  the 
performance  of  fuzzy  ARTMAP  image  classification.  The  accuracy  value  reported  is  the 
simple  accuracy  with  respect  to  the  truth  image  created  by  the  fuzzy  K-means  algorithm. 
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Table  6.1.7  -  Summary  of  fuzzy  ARTMAP  classification  statistics 


image  name 

elapsed  time  to  classify 
image  (sec) 

percentage  of  image 
classified 

classification 

accuracy 

city.  Ian 

41 

97 

91 

landcover.lan 

46 

96 

89 

roch84.1an 

28 

96 

90 

seashore.lan 

46 

96 

93 

For  comparison  purposes,  Table  6.1.8  summarizes  the  classification  times  and 
accuracies  for  the  different  classification  algorithms.  The  classification  time  for  the  GML 
approach  reflects  the  sum  of  the  time  required  to  make  the  statistics  file  and  classify  the 
image.  Similarly  the  time  reported  for  the  fuzzy  ARTMAP  approach  reflects  the  time 
required  to  create  the  neural  network  and  classify  the  image.  The  reported  time  for  the 
nPDF  classification  approach  reflects  the  sum  of  the  time  to  project  the  training  data  and 
the  original  image.  The  time  required  to  create  the  LUT  is  not  reported.  This  procedure 
should  not  be  rushed  as  classification  accuracy  is  wholly  dependent  upon  it.  It  can  be 
realistically  estimated  to  require  approximately  20  minutes  to  construct  a  five-  or 
six-class  nPDF  LUT. 


Table  6.1.8-  Summary  of  classification  times  and  accuracies 


image  name 

GML  time 
(sec) 

GML 

accuracy 

nPDF  time 
(sec) 

nPDF 

accuracy 

ARTMAP 
time  (sec) 

ARTMAP 

accuracy 

city.  Ian 

71 

88 

104 

88 

92 

91 

landcover.lan 

67 

85 

115 

78 

62 

89 

roch84.1an 

31 

86 

71 

70 

36 

90 

seashore.lan 

40 

90 

63 

86 

118 

93 

92 


To  aid  in  comparison,  Figure  6.1.5  graphically  displays  the  classification  accuracy 
and  Figure  6.1.6  depicts  the  required  image  classification  time. 


city  landcover  roch84  seashore 


image 

Figure  6.1.5  -  Plot  of  task  1  classification  accuracies 


image 


Figure  6.1.6  -  Plot  of  task  1  elapsed  classification  times 


Figure  6.1.7  was  formed  by  computing  the  ratio  of  the  classification  accuracies  to  the 
required  training/classification  time.  The  objective  of  this  plot  is  to  visualize  the 
performance  of  each  algorithm  with  respect  to  accuracy  and  required  classification  time. 


Figure  6.1.7  -  Plot  of  task  1  ratio  of  accuracy  to  classification  time 

Note  that  the  fuzzy  ARTMAP  classifier  consistently  produced  the  greatest 
classification  accuracy.  When  the  number  of  ART^  classification  nodes  equaled  the 
number  of  classes  of  data,  the  algorithm's  classification  time  is  favorably  comparable  to 
that  of  the  GML.  When  the  vigilance  parameter  is  increased,  or  there  is  great  variation  in 
the  data,  its  performance  suffers  due  to  the  number  of  calculations  that  must  be  performed 
on  each  individual  pixel.  The  nPDF  algorithm  consistently  produced  the  lowest 
classification  accuracies  coupled  with  the  greatest  classification/training  time.  This  can 
be  attributed  to  the  inherent  smearing  and  loss  of  data  present  in  any  projection  technique. 
This  approach  is  not  without  merit.  Its  greatest  strength  lies  in  its  data  visualization 
properties. 

A  variety  of  classification  metrics  were  developed  for  this  study.  The  various 
metrics  were  graphed  for  each  image.  This  graph  is  present  in  appendix  C.  For  this  task. 
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the  classification  accuracy  metrics  were  consistently  found  to  be  ordered  with  simple 
accuracy  first,  Brennan  and  Prediger's  Kappa,  followed  by  the  standard  kappa  coefficient. 
For  this  reason,  only  the  simple  accuracy  metric  was  reported. 

The  color  plates  in  Figure  6.1.8  through  6.1.1 1  depict  the  classification  maps  for 
each  image  and  each  classification  methodology.  All  colormaps  are  encoded  in  the  same 
manner.  Class  1  is  red,  class  2  is  blue,  class  3  is  green,  class  4  is  purple,  class  5  is  yellow, 
and  class  6  is  cyan.  In  this  way  the  output  from  the  class  statistics  module  can  be  visually 
coupled  with  class  statistics  information. 
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fuzzy  ARTMAP 


nPDF 


Figure  6.1.10  -  task  1  classification  maps  for  the  roch84.1an  image 
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6.2  Task  2:  Hybrid  Classification  Discussion  and  Results 


Task  two  of  this  study  utilized  a  hybrid  classification  methodology.  In  this 
scheme,  the  input  image  is  first  segmented  to  produce  an  image  composed  of  tightly 
clustered  data,  and  this  image  is  then  classified.  The  nPDF  algorithm  was  used  to 
segment  the  city.lan  and  landcover.lan  images.  The  water  class  from  the  city  image  and 
two  tightly  intermingled  vegetation  classes  from  the  landcover  image  were  used  to  create 
a  segmented  image  comprised  of  just  the  classes  of  interest.  These  segmented  images 
were  then  processed  by  the  fuzzy  K-means  algorithm  to  form  trusted  training  classes  and 
truth  images  with  the  method  described  in  section  6. 1  of  this  report.  The  training  data  for 
the  images  were  passed  to  the  GML  and  fuzzy  ARTMAP  classifiers  where  the  segmented 
image  was  classified  by  each  algorithm.  Note  that  since  numerous  classes  were  made 
from  tightly  clustered  data,  the  resulting  training  class  sets  will  have  mean  vectors  in 
close  proximity  to  each  other.  The  goals  for  this  task  were  to  experiment  with  hybrid 
classification,  classify  more  than  95%  of  the  segmented  image,  and  to  test  the 
performance  of  the  algorithms  when  operating  upon  data  that  is  not  well  separated  in 
feature  space. 

The  hybrid  classification  methodology  should  produce  higher  classification 
accuracies  and  lower  elapsed  times.  This  observation  is  partly  due  to  the  fact  that  only  a 
portion  of  the  image  needs  to  be  processed.  Because  of  this,  classification  times  will  be 
reduced.  The  "skip  zero  vectors"  features  of  the  classifier  modules  will  be  used  to 
support  this  operation.  In  addition,  little  extraneous  data  is  passed  to  the  classification 
stage  from  the  segmentation  phase.  Decreased  misclassification  will  be  observed  in  both 
algorithms  due  to  the  great  reduction  of  extraneous  data.  As  such,  the  subsequent 
classification  algorithms  can  "concentrate"  on  the  detailed  classification  task  at  hand. 
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As  previously  stated  in  the  introduction,  the  nPDF  algorithm  was  used  first  to 
segment  the  city  and  landcover  images  to  form  the  eity_water  and  seg_landcover 
multispectral  images.  This  algorithm  is  most  appropriate  for  the  segmentation  task  as  it 
lends  itself  to  situations  where  class  separation  is  large  and  classification  accuracy  is  not 
paramount.  This  statement  can  be  readily  reinforced  by  reviewing  the  performance  of  the 
algorithm  in  task  one.  Select  portions  of  the  LUTs  used  in  this  task  were  reused  to  create 
the  segmentation  LUTs.  In  an  image  segmentation  mode,  the  nPDF  algorithm  functions 
similarly  to  that  used  in  a  classification  role  with  one  important  difference.  Instead  of 
producing  a  classification  map,  this  operation  produces  a  new  multispectral  image 
composed  of  pixels  that  fall  within  the  classification  boundaries  of  the  LUT  in  the 
projected  feature  space.  Any  pixels  that  did  not  fall  within  the  boundaries  of  the  LUT 
classification  regions  were  assigned  zero  vectors  in  the  resulting  output  images.  Figure 
6.2.1  on  the  following  page  depicts  the  nPDF  approach  to  image  segmentation  and  Table 
6.2.1  highlights  the  important  statistics  for  the  segmentation  operation. 


Table  6.2.1  -  nPDF  segmentation  statistics 


image  name 

hypercube 

corners 

scale  factor 

segmentation  time  for 
image  (sec) 

percentage  of  image 
segmented 

city_water.lan 

1  &4 

512 

54 

37.1 

seg_landcover.lan 

1  &4 

512 

57 

39.5 

The  resulting  segmented  images  were  then  clustered  with  the  fuzzy  K-means 
algorithm  to  create  truth  images  and  training  data.  Once  again,  the  high  membership 
value  was  manipulated  to  ensure  statistically  sound  training  sets.  Similarly,  the  low 
membership  value  was  varied  to  classify  the  vast  majority  of  the  image  to  form  a  truth 
image.  The  important  parameter  from  the  clustering  operation  are  summarized  in  Table 
6.2.2. 
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nPDF  projection 
of  city  image 


Segmentation  LUT 


nPDF  projection  of  Segmentation  LUT 

landcover  image 


Figure  6.2.12  -  nPDF  segmentation  LUT  development 


Table  6.2.2  -  Clustering  statistics  for  hybrid  classification 


image  name 

number  of 
cluster  centers 

membership 

values 

number  of 
iterations 

elapsed  time 
(sec) 

percentage  of 
image  classified 

city_water.lan 

4 

0.85  &  0.3 

12 

711 

37.1 

seg_landcover.lan 

5 

0.85  &  0.3 

12 

1,041 

39.5 

Once  the  training  data  was  obtained,  a  parametric  model  of  the  data  was 
constructed  for  the  GML  classifier.  Table  6.2.3  summarizes  the  number  of  pixels  in  each 
set,  the  number  of  classes  in  each  training  set  and  the  time  required  to  calculate  the  mean 
vectors,  the  inverses  of  the  variance-covariance  matrices,  and  their  determinants. 


Table  6.2.3  -  Training  class  statistics  for  hybrid  GML  classification 


image  name 

number  of  classes 

number  of  points  in 
training  set 

elapsed  time  to  create 
statistics  file  (sec) 

city_water.lan 

4 

52,458 

3 

seg_landcover.lan 

5 

22,788 

2 

With  a  parametric  model  describing  the  distribution  of  the  data,  GML  classification  was 
readily  accomplished.  The  value  of  was  manipulated  to  achieve  the  target  image 
classification  percentage  value  of  95%.  A  percentage  less  than  100%  was  utilized  to 
provide  a  fair  comparison  between  the  performance  of  the  two  classifiers.  This  was 
accomplished  by  forcing  both  the  neural  network  and  the  GML  classifier  to  discriminate 
the  outlying  data  elements.  If  100%  classification  had  been  used  instead,  the  GML 
approach  would  have  suffered  from  errors  due  to  improper  inclusion  of  pixels  in  a  class 
due  to  large  distances.  Similarly,  the  run-time  performance  of  the  ARTMAP  classifier 
would  have  been  artificially  enhanced  due  to  the  ARTMAP's  attempt  to  maximize  its 
generalization  of  feature  space  division.  Fewer  recognition  regions  require  fewer  weight 
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vectors  to  describe  them  and  fewer  calculations  on  each  image  pixel.  The  fuzzy 
ARTMAP  was  forced  to  minimize  its  generalization  of  feature  space  by  utilizing  high 
values  for  the  vigilance  parameter.  The  reported  classification  accuracy  metric  is  the 
simple  accuracy,  and  it  is  created  with  respect  to  the  truth  image  created  with  the  fuzzy 
K-means  approach.  Important  classification  results  are  summarized  in  Table  6.2.4. 


Table  6.2.4  -  Hybrid  GML  classification  results 


image  name 

value  of  yj 

elapsed  classification 
time  (sec) 

percentage  of  image 
classified 

classification 

accuracy 

citywater.lan 

65 

15 

95.6 

93 

seg_landcover.  Ian 

90 

26 

95.2 

92 

Note  that  the  values  of  the  distance  are  considerably  smaller  than  the  values  that  were 
utilized  in  task  one.  This  can  be  readily  explained  by  realizing  that  the  data  is  tightly 
clustered  in  feature  space  and  has  considerably  lower  spectral  extent.  As  such,  GML's 
hyperelliptical  classification  regions  need  only  increase  a  moderate  amount  to  accomplish 
the  desired  percentage  of  image  classification.  Note  that  the  reported  value  is  with 
respect  to  the  portion  of  the  segmented  image  composed  of  non-zero  pixel  intensity 
vectors. 

Classification  with  the  fuzzy  ARTMAP  algorithm  was  then  accomplished 
utilizing  the  same  training  data  as  for  the  GML  approach.  Since  the  training  data  is 
tightly  clustered  in  feature  space  and  having  small  spectral  extent,  high  values  of  the 
vigilance  parameter  p  were  necessary  to  achieve  the  percentage  of  image  classification 
desired.  As  expected,  a  large  value  for  the  vigilance  parameter  produces  a  neural  network 
with  finely  granulized  feature  space  recognition  regions.  Recall  that  each  classification 
region  is  defined  by  the  weight  vector  of  a  node.  In  this  case,  multiple  nodes  were 
needed  to  encompass  the  spectral  extent  of  the  target  classes.  Table  6.2.5  summarizes  the 
important  training  statistics. 
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Table  6.2.5  -  Hybrid  fuzzy  ARTMAP  network  statistics 


image  name 

p  for  ARTa 

number  of  ART^ 
nodes 

number  of  learning 
iterations 

elapsed  time  to  create 
network  (sec) 

city_water.lan 

0.97 

9 

2 

15 

seg_landcover.lan 

0.94 

10 

2 

8 

Once  the  neural  network  was  created,  image  classification  was  easily 
accomplished.  Rapid  image  classification  was  realized  because  only  the  non-zero  pixel 
intensity  vectors  must  be  evaluated.  The  reported  accuracy  metric  is  the  simple  accuracy 
with  respect  to  the  fuzzy  K-means  truth  image.  Important  classification  statistics  are 
presented  in  Table  6.2.6  below. 


Table  6.2.6  -  Hybrid  fuzzy  ARTMAP  classification  results 


image  name 

elapsed  time  to  classify 
image  (sec) 

percentage  of  image 
classified 

classification 

accuracy 

city_water.lan 

22 

95.1 

97 

segjandcover.lan 

29 

94.1 

95 

Table  6.2.7  summarizes  the  hybrid  classification  results.  The  times  for  the  GML 
approach  reflect  the  image  segmentation  operation,  class  statistics  determination,  and 
image  classification  elapsed  times.  Similarly,  the  reported  elapsed  times  for  the  hybrid 
ARTMAP  approach  is  the  sum  of  the  image  segmentation,  network  creation,  and  image 
classification  operations. 


Table  6.2.7  -  Summary  of  hybrid  classification  results 


image  name 

hybrid  GML  time 
(sec) 

hybrid  GML 
accuracy 

hybrid  ARTMAP 
time  (sec) 

hybrid  ARTMAP 
accuracy 

city_water.Ian 

72 

93 

91 

97 

seglandcover.lan 

85 

92 

94 

95 

105 


Figure  6.2.2  graphically  depicts  the  simple  accuracy  measurements  for  the 
city_water  and  the  seg_landcover  images  classification  results. 


Figure  6.2.2  -  task  2  image  classification  accuracy  results 


Figure  6.2.3  depicts  the  required  classification  times.  Note  that  the  reported  time  figure  is 
the  sum  of  the  image  segmentation,  data  modeling  or  network  creation,  and  classification 
operations. 


image 


Figure  6.2.3  -  elapsed  time  for  hybrid  image  classification 
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Figure  6.2.4  was  formed  by  computing  the  ratio  of  the  classification  accuracy  to  the 
required  classification  time. 


■o 

0> 

(A 

Q. 

m 


image 


Figure  6.2.4  -  ratio  of  classification  accuray  to  elapsed  time 
for  hybrid  image  classification 


As  was  the  case  in  task  1 ,  the  fuzzy  ARTMAP  approach  produced  the  best 
accuracy,  but  at  the  cost  of  greater  processing  time  than  the  GML  approach.  The  effect 
of  minimized  class  separation  is  partly  the  cause  of  this  observation.  It  is  interesting  to 
note  that  the  training  data  for  the  classifiers  were  projected  into  nPDF  space  and 
classification  in  all  cases  would  have  been  difficult  if  not  impossible  with  this 
methodology.  The  training  data  also  is  not  necessarily  distributed  in  a  multivariate 
normal  manner.  For  these  reasons,  the  non-parametric  approach  may  have  some  inherent 
advantage.  Note  that  in  this  case,  as  in  task  1,  the  training  data  was  collected  by  the  fuzzy 
K-means  algorithm.  The  ARTMAP  benefits  from  this  as  a  spectrally  pure  closely 
clustered  data  set  is  presented  for  training.  As  such,  it  is  able  to  create  a  relatively  small 
number  of  recognition  regions  in  feature  space  that  provide  for  the  conflicting  needs  of 
within-class  generalization  while  providing  between-class  distinction.  While  the  GML 
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approach  inherently  minimizes  the  effect  of  spurious  training  exemplars  through  the 
creation  of  the  variance  covariance  matrix,  the  neural  network  employed  in  this  study 
enjoys  no  such  luxury.  The  GML  approach  does  not  greatly  benefit  from  the  spectrally 
pure  training  data  because  outlying  data  is  automatically  averaged  out.  Also  note  that,  if 
the  training  data  was  collected  at  a  very  high  membership  value,  then  the  training  data 
may  he  spectrally  colored  by  the  mathematical  processes  employed  by  the  training  data 
collection  process.  This  could  distort  the  class  orientation  information  and  lead  to  poor 
classification.  This  effect  was  minimized  by  ensuring  that  the  membership  value 
produced  a  training  set  composed  of  a  statistically  significant  number  of  values.  Task  3 
will  explore  the  performance  of  these  algorithms  on  user-defined  data  sets.  These  data 
will  better  describe  the  spectral  extent  of  a  class  in  feature  space  at  the  cost  of  necessarily 
including  some  "impure"  training  data. 

All  classification  accuracy  measures  were  reported  with  the  simple  accuracy 
measure.  As  was  the  case  in  task  one,  the  values  were  always  found  to  be  in  the  same 
order  with  the  exception  of  the  weighted  accuracy  which  attempts  to  account  for  size  of 
each  class.  A  graph  comparing  the  various  metrics  is  presented  in  appendix  B. 

Figures  6.2.5  and  6.2.6  represent  the  classification  maps  produced  in  this  task. 
They  are  color  coded  to  the  information  in  the  confusion  matrices  or  class  statistics  files 
to  support  visual  inspection.  Class  one  is  red,  class  2  is  green,  class  3  is  blue,  class  4  is 
purple,  class  5  is  yellow,  and  class  six  is  cyan. 
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fuzzy  ARTMAP 

Figure  6.2.5  Hybrid  image  classification  results 
for  a  vegetation  class  in  the  landcover.lan  image 
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fuzzy  K-means 


fuzzy  ARTMAP  GML 


Figure  6.2.6  Hybrid  image  classification  results 
for  the  water  class  in  the  city.lan  image 
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Figure  6.3.1  -  AVS  network  to  create  evaluation  polygons  and  truth  images 

This  network  reads  in  an  multispectral  image,  displays  it,  and  then  permits  polygon 
regions  to  be  overlaid  to  define  the  evaluation  polygons.  This  operation  is  supported  by 
the  Select  Polygon  Region  AVS  module.  The  multispectral  data  within  the  polygons  is 
then  passed  to  the  rest  of  the  network.  Two  important  operations  then  occur  in  parallel. 
In  the  first  operation,  the  multispectral  data  from  the  polygons  are  written  to  disk  files  as 


6.3  Task  3  Results  and  Discussion 

The  goal  of  task  3  was  to  evaluate  the  performance  of  the  various  classifiers  when 
trained  and  evaluated  with  user-defined  data.  The  training  data  were  collected  for  the 
algorithms  by  utilizing  the  user-interactive  module.  Truth  images  were  constructed  by 
designating  polygons  representative  of  the  training  classes  within  the  images  from  which 
the  training  data  were  collected.  The  individual  polygons  were  then  segmented  from  the 
image  to  form  an  evaluation  and  truth  image.  The  AVS  procedure  in  Figure  6.3.1  was 
utilized  to  create  the  segmented  image  and  truth  image. 
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individual  images.  These  individual  images  are  later  combined  into  an  evaluation  image 
with  the  network  used  to  combine  nPDF  LUTs.  In  a  similar  fashion,  the  polygons  are 
passed  onto  the  Fill  Polygon  Region  module  where  they  are  filled  with  the  class  number. 
These  polygons  are  written  to  disk  and  combined  to  form  a  truth  image  to  support 
measurements  of  classification  accuracy.  Figures  6.3.2  through  6.3.5  graphically  depict 
the  training  and  evaluation  polygons  used  for  each  class  in  each  image.  To  aid  in 
interpretation,  training  polygons  are  colored  blue  while  evaluation  polygons  are  filled 
with  red. 

Care  was  taken  during  the  collection  of  training  data  to  ensure  that  statistically 
significant  numbers  of  points  were  included  in  each  training  class.  Once  the  training  data 
was  collected,  image  classification  proceeded  as  previously  described.  The  first  step  in 
the  GML  process  was  the  creation  of  the  class  statistics  files.  Table  6.3.1  summarizes  the 
important  statistics  from  this  process  and  detailed  statistics  for  each  of  image  are 
presented  in  appendix  B. 


Table  6.3.1  -  Summary  of  GML  statistics  file  parameters 


image  name 

number  of  classes 

number  of  points  in 
training  set  (sec) 

elapsed  time  to  create 
statistics  file  (sec) 

city.lan 

5 

17,502 

2 

landcover.lan 

4 

15,422 

2 

roch84.1an 

3 

4,407 

1 

seashore.lan 

5 

9,588 

1 

After  the  statistics  files  have  been  created,  GML  classification  was  readily  accomplished. 
Since  the  image  being  classified  was  segmented  to  support  evaluation,  the  skip  zero 
vectors  option  was  employed  when  using  the  AVS  modules.  Table  6.3.2  summarizes  the 
classification  parameters  and  statistics. 
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training  polygon 


evaluation  polygon 


Figure  6.3.2 
landcover.lan 


1  =  scrub 

2  =  vegetation 

3  =  trees 

4  =  bare  soil 


Figure  6.3.3 
city.  Ian 


1  =  water 

2  =  scrub 

3  =  bare  soil 

4  =  roof 

5  =  asphalt 


training  polygon 


evaluation  polygon 


Figure  6.3.4 
roch84.1an 


1  =  urban 

2  =  vegetation 

3  =  soil 


Figure  6.3.5 
seashore.lan 


1  =  water 

2  =  grass 

3  =  scrub 

4  =  concrete 

5  =  sand 


Table  6.3.2  -  Summary  of  GML  classification  statistics 


image  name 

value  of 

elapsed  classification 
time  (sec) 

percentage  of  image 
classified 

training 

accuracy 

classification 

accuracy 

city.  Ian 

90 

4 

99 

100 

96 

landcover.lan 

100 

5 

99 

96 

98 

roch84.1an 

50 

3 

99 

98 

96 

seashore.  Ian 

50 

4 

99 

98 

96 

The  variance  of  the  values  of  the  distance  should  be  expected.  The  different  classes  in 

the  user-defined  training  data  vary  in  spectral  extent  in  feature  space.  Its  value  was 
manipulated  until  99%  of  the  image  would  be  classified  with  few  pixels  being  assigned  to 
the  background  class.  The  training  accuracy  metrics  in  the  table  represents  the 
performance  of  the  classifier  on  the  training  data.  The  classification  accuracy  metric  is 
the  simple  accuracy  and  all  detailed  confusion  matrices  are  in  appendix  B. 

Image  classification  with  the  nPDF  algorithm  was  accomplished  next.  The 
training  data  collected  by  the  user  interactive  module  was  projected  into  different  nPDF 
spaces  until  minimal  class  overlap  was  observed.  Table  6.3.3  summarizes  the  parameters 
used  and  the  elapsed  time  required  to  project  the  training  data. 


Table  6.3.3  -  Summary  of  nPDF  LUT  statistics 


image  name 

hypercube 

corners 

scale  factor 

projection  time  for 
training  data  (sec) 

projection  time  for 
image  (sec) 

city.  Ian 

3&4 

512 

4 

51 

landcover.lan 

1  &2 

512 

3 

59 

roch84.1an 

1  &4 

800  ^ 

1 

35 

seashore.  Ian 

1  &3 

512 

1 

32 

LUTs  were  then  constructed  from  the  information  in  the  projected  training  data.  Figure 
6.3.6  depicts  the  projected  training  data  and  the  resulting  LUT  for  visual  inspection. 
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city  training 
data  in  nPDF  space 
and  resulting  LUT 


landcover  training 
data  in  nPDF  space 
and  resulting  LUT 


rochester  training 
data  in  nPDF  space 
and  resulting  LUT 


seashore  training 
data  in  nPDF  space 
and  resulting  LUT 


Figure  6.3.6  -  nPDF  development  for  the  user  defined  training  classes 
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The  LUTs  created  from  the  user-defined  data  proved  to  be  the  most  difficult  to  generate. 
This  problem  stems  from  the  fact  that  some  training  classes  had  a  relatively  small  number 
of  pixels  and  there  was  considerable  variance  in  some  of  the  training  data.  This  made 
determining  classification  boundaries  a  difficult  and  iterative  process.  Once  the  LUTs 
were  constructed,  nPDF  classification  was  performed.  The  important  statistics  from  the 
nPDF  classification  operation  are  summarized  in  Table  6.3.4  below. 


Table  6.3.4  -  Summary  of  nPDF  classification  statistics 


image  name 

elapsed  classification 
time  (sec) 

percentage  of  image 
classified 

training 

accuracy 

classification 

accuracy 

city.  Ian 

3 

96 

90 

88 

landcover.lan 

7 

97 

95 

94 

roch84.1an 

2 

97 

93 

94 

seashore.  Ian 

3 

98 

91 

90 

Note  that  it  was  not  possible  to  achieve  the  desired  percentage  of  image  classification. 
Once  again,  this  is  due  to  the  fact  that  it  is  very  difficult  to  draw  classification  boundaries 
that  do  not  overlap.  This  same  problem  is  the  cause  of  the  relatively  low  classification 
accuracy  on  both  the  dependent  training  used  in  the  LUT  creation  and  the  independent 
data  that  the  classification  accuracy  was  evaluated  upon. 

The  same  data  used  with  the  GML  and  the  nPDF  approach  were  then  utilized  to 
train  the  fuzzy  ARTMAP  neural  network.  The  value  of  the  vigilance  parameter  was 
adjusted  to  achieve  the  desired  level  of  image  classification.  Table  6.3.5  summarizes  the 
network  parameters,  number  of  learning  iterations,  and  the  elapsed  time  required  to  create 
the  networks. 
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Table  6.3.5  -  Summary  of  fuzzy  ARTMAP  network  statistics 


image  name 

p  for  ARTg 

number  of  ART^ 
nodes 

number  of  learning 
iterations 

elapsed  time  to  create 
network  (sec) 

city.  Ian 

0.91 

45 

2 

15 

landcover.lan 

0.9 

40 

3 

19 

roch84.1an 

0.92 

58 

2 

5 

seashore.  Ian 

0.92 

28 

2 

6 

With  the  networks  created,  image  classification  was  then  achieved.  Table  6.3.6 
summarizes  the  resulting  accuracy  measurements  and  the  elapsed  time  required.  The 
training  accuracy  metric  relates  the  performance  of  the  neural  networks  on  the  training 
data.  In  all  cases,  training  was  stopped  once  full  recognition  of  the  training  data  was 
achieved.  Note  that  the  accuracy  measurement  is  the  simple  accuracy  and  is  measured 
with  respect  to  the  truth  image. 


Table  6.3.6  -  Summary  of  fuzzy  ARTMAP  classification  statistics 


image  name 

elapsed  time  to  classify 
image  (sec) 

percentage  of  image 
classified 

training 

accuracy 

classification 

accuracy 

city.  Ian 

18 

99 

100 

97 

landcover.lan 

13 

99 

100 

95 

roch84.1an 

7 

99 

100 

96 

seashore.  Ian 

6 

99 

100 

95 

Table  6.3.7  summarizes  the  classification  accuracies  and  elapsed  times. 


Table  6.3.7  -  Summary  of  classification  times  and  accuracies 


image  name 

GML  time 
(sec) 

GML 

accuracy 

nPDF  time 
(sec) 

nPDF 

accuracy 

ARTMAP 
time  (sec) 

ARTMAP 

accuracy 

city.  Ian 

6 

96 

55 

88 

33 

97 

landcover.lan 

6 

98 

62 

94 

32 

95 

roch84.lan 

4 

96 

36 

94 

12 

96 

seashore.  Ian 

5 

96 

33 

90 

12 

95 
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Figures  6.3.7  and  6.3.8  below  graphically  depict  the  classification  and  elapsed  computing 
time. 


image 


Figure  6.3.7  -  Task  3  classification  accuracies 


image 


Figure  6.3.8  -  Task  3  classification  elapsed  times 


Figure  6.3.9  was  created  by  computing  the  ratio  of  the  classification  accuracy  to 
the  required  classification  time. 
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Figure  6.3.9  -  Task  3  ratio  of  classification  accuracy  to  classification  time 

The  results  of  this  task  highlight  the  primary  concern  present  when  utilizing 
nonparametric  classifiers.  Note  that  in  contrast  to  its  performance  in  the  other  tasks,  the 
GML  classifier  twice  displayed  better  performance  than  the  fuzzy  ARTMAP  classifier. 
The  high  classification  accuracy  performance  of  the  GML  classifier  can  be  most  easily 
attributed  to  the  mathematical  properties  of  the  variance-covariance  matrix.  The  matrix 
contains  information  about  the  shape  of  the  training  data  distribution,  its  orientation  in 
feature  space,  and  its  extent.  As  such,  the  parametric  classifier  has  the  ability  to  logically 
"fill  in"  missing  data  points,  and  the  impact  of  noisy  or  spurious  training  data  are 
automatically  averaged  out.  In  contrast,  the  classification  performance  of  the 
nonparametric  classifiers  entirely  depends  on  the  quality  of  the  training  data,  and  they  are 
inherently  unable  to  account  for  missing  information.  Incomplete  training  data  sets  will 
always  be  encountered  when  the  training  data  is  interactively  determined  by  the  image 
analyst.  Due  to  this  reality,  GML  may  represent  the  optimal  image  classification  strategy 
when  dealing  with  user-defined  training  data. 

Classification  maps  from  this  task  are  depicted  in  Figures  6.3.8  through  6.3.1 1. 
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fuzzy  ARTMAP  nPDF 


Figure  6.3.8  -  task  3  classification  maps  for  the  city. Ian  image 
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fuzzy  ARTMAP  nPDF 


Figure  6.3.9  -  task  3  classification  maps  for  the  landcover.lan  image 
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truth  image 


GML 


fuzzy  ARTMAP 


nPDF 


Figure  6.3.10  -  task  3  classification  maps  for  the  roch84.1an  image 
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truth  image 


GML 


fuzzy  ARTMAP  nPDF 

Figure  6.3.1 1  -  task  3  classification  maps  for  the  seashore.lan  image 
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7.0  Summary 


The  fuzzy  K-means  clustering  algorithm  was  shown  to  be  effective  for  both 
creating  spectrally  pure  training  data  and  truth  images  for  measuring  classification 
accuracy.  This  flexibility  is  gained  by  its  employment  of  fuzzy  logic  through  the 
membership  function.  Features  within  an  image  for  which  it  is  very  difficult  to  collect 
training  data,  either  due  to  size  or  sparse  positioning,  can  be  effectively  sampled  with  this 
method.  There  are  three  main  problems  with  this  approach  to  image  classification.  First, 
the  algorithm  is  extremely  computationally  intensive.  This  observation  is  easily 
explained  by  realizing  that  the  membership  of  each  pixel  in  the  image  with  respect  to 
each  desired  cluster  center  must  be  determined  iteratively.  Secondly,  the  clusters  formed 
are  computed  in  an  entirely  unsupervised  manner  and  may  be  difficult  to  visually  label. 
Finally,  the  data  collection  methodology  inherently  colors  the  training  class  data  which 
destroys  some  elass  distribution  information. 

The  heart  of  this  study  concerns  itself  with  the  manner  in  which  each 
classification  algorithm  divides  feature  space  into  recognition  regions.  Gaussian 
maximum  likelihood  utilizes  hyperellipsoids,  the  nPDF  algorithm  allows  the  analyst  to 
define  arbitrary  boundaries  in  a  projection  of  feature  space,  and  the  fuzzy  ARTMAP 
neural  network  utilizes  stacked  hyper-rectangles  with  exception  handling. 

GML  is  the  classical  approach  to  multispectral  image  classification.  This  study 
has  demonstrated  that  its  classification  accuracy  and  computational  requirements  on 
user-defined  data  are  difficult  to  achieve  by  other  methods,  even  an  advanced  neural 
network.  The  variance-covariance  matrix  at  the  core  of  the  algorithm  provides  not  only 
the  location  of  the  classes  in  feature  space,  but  also  a  measure  of  their  extent  and 
orientation.  In  addition,  the  method  used  to  calculate  the  variance-covariance  matrix 
from  the  training  data  automatically  weights  the  effects  of  both  frequently  occurring  and 
outlying  data  points.  Neither  of  the  non-parametric  classifiers  addressed  in  this  study  are 
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able  to  accomplish  this.  The  determination  of  class  extent  and  orientation  is  achieved 
through  the  assumption  of  normally  distributed  pixels  made  when  calculating  the 
variance-covariance  matrix  which  forms  the  core  of  its  data  dispersion  model.  The 
validity  of  the  normality  assumption  was  shown  to  come  from  the  averaging  effect  of  the 
sensor,  and  be  a  reasonable  assumption  in  most  remote  sensing  applications. 

The  nPDF  approach  to  image  classification  was  shown  to  uniquely  involve  the 
analyst  in  image  classification.  By  interactively  drawing  class  boundaries  in  a  projection 
of  feature  space,  subtle  variations  in  class  boundaries  can  be  accounted  for  in  a  manner 
that  is  not  possible  algorithmically.  It  is  important  to  note  that  outlying  or  mislabeled 
training  data  are  handled  in  an  extremely  effective  manner.  Incorrect  training  data  are 
automatically  grouped  into  the  correct  class  through  the  projection  operation.  This  facet 
of  the  algorithm  was  exploited  in  the  hybrid  image  classification  task.  The  greatest 
strength  of  this  algorithm  is  its  data  visualization  properties  as  separability  between 
classes  can  be  readily  interpreted.  While  class  separability  can  be  readily  visually 
interpreted,  defining  accurate  boundaries  between  the  classes  proved  to  be  difficult.  In 
addition,  this  method  enjoys  no  real  computational  advantage  in  terms  of  elapsed  time 
required  to  classify  an  image  when  compared  to  GML,  once  the  time  to  project  the 
original  image  to  determine  class  extent  is  included.  The  introduction  to  this  study 
mentioned  that  the  nPDF  algorithm  could  be  potentially  useful  for  determining  the 
number  of  classes  present  in  an  image.  This  facet  of  the  algorithm  proved  impossible  to 
demonstrate  with  the  LANDSAT  or  M-7  imagery  used  in  this  study.  Had  it  been  possible 
to  achieve,  distinct  peaks  in  the  nPDF  projections  of  the  images  would  have  been  noted. 

Image  classification  with  the  fuzzy  ARTMAP  neural  network  produced  intriguing 
results.  When  it  is  presented  with  the  spectrally  pure  training  data  collected  by  the  fuzzy 
K-means  algorithm,  its  classification  accuracy  performance  was  shown  to  be  unparalleled 
with  only  a  slight  increase  in  time  required  to  train  and  classify  the  image  as  compared  to 
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GML.  This  strength  springs  from  the  employment  of  fuzzy  set  theory  and  ART 
dynamics.  No  other  neural  network  architecture  so  effectively  combines  these  traits. 
When  user-defined  data  is  utilized  with  this  approach,  its  greatest  weaknesses  are 
highlighted.  Large  variations  in  data  coupled  with  cluster  centers  that  are  close  to  one 
another  in  feature  space  result  in  a  neural  network  with  many  small  hyper-rectangles 
dividing  feature  space  into  recognition  regions.  This  case,  which  occurs  often  in  the 
remote  sensing  application,  springs  from  the  attempt  to  achieve  the  conflicting  goals  of 
maximizing  generalization  while  maintaining  separability.  This  results  in  numerous 
computations  being  completed  for  each  pixel  in  the  image  or  training  set.  This  manifests 
itself  as  increased  learning  and  image  classification  times.  If  the  network  has  not  been 
presented  with  the  examples  of  the  complete  spectral  extent  of  a  class,  it  is  not  capable  of 
determining  membership  in  the  manner  that  GML  is.  Since  the  data  distribution  is  not 
modeled,  no  mathematical  inference  other  than  the  fuzzy  "nearness"  can  be  determined. 
While  GML  can  determine  that  a  pixel  not  explicitly  encountered  during  training  should 
"fit"  in  the  distribution  of  one  of  its  classes,  the  neural  network  cannot.  It  requires 
training  sets  composed  of  pixels  that  both  are  spectrally  pure  and  that  completely  define 
the  spectral  extent  of  the  classes  in  feature  space. 

The  varying  measurements  of  classification  accuracy  employed  for  this  study 
were  found  to  always  follow  the  same  pattern.  The  simple  accuracy  consistently 
provided  the  greatest  measure  of  classification  accuracy,  followed  by  Brennan  and 
Prediger's  kappa,  while  the  standard  kappa  coefficient  always  provided  the  worst  measure 
of  accuracy.  The  weighted  accuracy,  which  attempted  to  account  for  class  size,  produced 
sporadic  results.  To  compare  the  classification  accuracy  performance  of  the  various 
algorithms,  any  measurement  could  be  reported,  and  the  simple  accuracy  was  utilized  in 
this  study. 
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In  general,  it  appears  that  the  GML  approach  to  multispectral  image  classification 
has  not  been  "dethroned"  by  either  of  the  non-parametric  classifiers  utilized  in  this  study. 
The  assumption  of  normal  distributed  training  and  target  class  data  is  reasonable.  This 
assumption  permits  the  algorithm  to  "fill  in"  missing  data  that  was  not  present  when  the 
data  dispersion  model  was  created.  Neither  of  the  non-parametric  classifiers  observed  in 
this  study  are  able  to  accomplish  this.  Given  robust  spectrally  pure  training  data,  the 
fuzzy  ARTMAP  may  provide  slightly  higher  classification  accuracies,  but  this  comes  at 
the  expense  of  considerable  complexity.  Given  classes  that  are  readily  spectrally 
separable,  the  nPDF  algorithm  produced  reasonable  results.  In  certain  situations  its 
performance  may  be  optimal.  In  general,  it  is  hampered  by  its  inherent  projection 
methodology. 

The  classification  algorithms  and  methods  employed  in  this  study  were  evaluated 
under  ideal  "laboratory"  conditions.  As  such,  some  comments  on  transitioning  this 
system  to  an  operational  role  are  warranted.  It  is  obvious  that  training  each  algorithm  on 
each  class  in  each  image  to  be  classified  is  overly  time  consuming.  It  would  be  desirable 
to  train  the  classification  algorithms  on  various  target  classes  of  interest  and  then  be  able 
to  classify  any  given  image.  All  of  the  classification  algorithms  rely  on  the  digital  count 
values  present  in  an  image  to  distinguish  between  classes.  These  digital  count  values  are 
entirely  dependent  upon  the  imaging  geometry  and  atmosphere  present  at  image 
acquisition  time.  As  such,  some  method  must  be  employed  to  remove  these  effects. 
Typically  these  methods  either  model  the  contributions  of  the  atmosphere  at  an  imaging 
time,  or  more  simply  scale  the  digital  count  values  present  in  one  image  to  match  those 
from  the  image  from  which  the  classifier  was  trained.  As  expected,  there  will  be  some 
loss  of  information  or  introduction  of  error  when  either  of  the  preceding  approaches  are 
applied.  Therefore,  while  classification  results  will  not  be  as  accurate  as  if  the  classifier 
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was  trained  with  data  from  the  image  to  be  classified,  considerable  time  will  be  saved  by 
applying  the  previously  determined  classification  models. 

It  is  also  interesting  to  note  that  the  classification  stage  of  each  algorithm  is 
inherently  parallelizable.  Any  problem  which  can  be  readily  divided  and  computed 
separately  on  multiple  processors  shares  this  quality.  Since  the  classification  results  from 
the  algorithms  in  this  study  depend  only  on  the  digital  count  values  of  the  pixel  in 
question,  the  classification  operation  can  be  easily  divided  across  several  processing 
units.  This  concept  is  easy  to  envision.  Consider  the  case  where  we  simply  divide  the 
input  image  by  the  number  of  available  processors.  The  resulting  subimages  could  then 
be  classified  by  each  processor  and  then  recombined  to  form  a  classification  map.  As 
such,  a  very  near  linear  decrease  in  classification  elapsed  time  can  be  realized. 


7. 1  Suggestions  for  Future  Work 

It  would  be  interesting  to  allow  the  user  to  select  the  starting  cluster  locations 
interactively  for  the  fuzzy  K-means  algorithm  instead  of  selecting  them  in  a 
pseudorandom  fashion.  This  would  give  the  analyst  some  control  of  the  resulting  cluster 
centers  and  make  labeling  the  resulting  classes  somewhat  easier. 

While  the  distance  measure  of  the  GML  classification  algorithm  allows  the 
extent  of  classes  to  be  controlled,  no  individual  parameter  for  each  class  is  provided  other 
than  the  measure  arising  from  the  determinant  of  the  variance-covariance  matrix.  If 
individual  distance  measures  were  employed,  varying  class  extent  in  feature  space  could 
be  controlled  and  compensated. 
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The  effects  of  varying  complement  coding  for  the  fuzzy  ARTMAP  neural 
network  were  not  explored.  Reduced  classification  and  learning  times  might  be  achieved, 
but  their  impact  on  classification  accuracy  cannot  be  predicted. 

No  attempt  was  made  to  study  the  effects  of  normality  on  the  classification 
accuracy  of  the  different  algorithms.  It  would  be  interesting  to  study  this  effect  by 
intentionally  skewing  the  distributions  of  the  training  and  evaluation  data. 
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Source  code  for  the  fuzzy  K-means  AVS  module 
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